Getting Started with the Swift Frontend: Lexing & Parsing
A previous article in this series explained two primary ways of invoking the swift
compiler executable: swift
and swift -frontend
.
- When invoking
swift -frontend
, theswift
executable enters itsmain
entry point and, once it sees the-frontend
option, it begins to do everything you and I think of when we think of compilers: it attempts to lex the source file it's given, parse that file into a syntax tree, type-check it, produce an object file, and so on. - When invoking just
swift
, without the-frontend
option, theswift
executable splits itself up into child invocations ofswift -frontend
. The logic that Swift uses to split itself up is in the libswiftDriver library.
Reading and Understanding the Swift Driver Source Code explains the libswiftDriver code that is executed in the second case. This article focuses on the first case: the "compiler-y" parts of the swift
compiler executable.
In a nutshell, I aim to answer the question. "what happens when I compile this simple Swift program, hello.swift
?"
hello.swift
1 // hello.swift 2 3 print("Hello, world!")
Parsing a Swift source file
First, I'll recap some details covered in previous articles. For example, I've explained that I can compile the hello.swift
program on the command line, by invoking swiftc hello.swift
. Even this simple invocation of swiftc
, because it does not include the -frontend
option, is split up into child jobs by the code in libswiftDriver. I can see these child jobs by invoking swiftc hello.swift -driver-print-jobs
, which outputs something like the following:
swift -frontend \ -c hello.swift \ -o /tmp/hello.o ld /tmp/hello.o \ -lSystem -arch x86_64 -macosx_version_min 10.13.0 \ -L /Users/bgesiak/Source/apple/build/Ninja-ReleaseAssert+swift-DebugAssert/swift-macosx-x86_64/lib/swift/macosx \ -rpath /Users/bgesiak/Source/apple/build/Ninja-ReleaseAssert+swift-DebugAssert/swift-macosx-x86_64/lib/swift/macosx \ -o hello
The first job invokes swift -frontend
in order to produce an object file named hello.o
, and the second job invokes the linker ld
in order to link that object file into an executable named hello
.
The first invocation appears to be very short, but make no mistake: it executes a lot of code, from a diverse set of libraries. These libraries include libswiftFrontend, libswiftParse, libswiftAST, libswiftSema, libswiftSIL, and more.
Covering each of these libraries in a single article would be exhausting. Instead, this article focuses on the first few phases of swift -frontend -c hello.swift
. It'll cover libswiftFrontendTool, libswiftFrontend, and libswiftParse. These three libraries are used, in conjunction with libswiftAST, to build a tree structure that represents the untyped syntax tree of the Swift source file.
I can display the untyped syntax tree by invoking swiftc hello.swift -dump-parse
, which outputs the following:
(source_file (top_level_code_decl (brace_stmt (call_expr type='<null>' arg_labels=_: (unresolved_decl_ref_expr type='<null>' name=print function_ref=unapplied) (paren_expr type='<null>' (string_literal_expr type='<null>' encoding=utf8 value="Hello, world!" builtin_initializer=**NULL** initializer=**NULL**))))))
Note the difference between the untyped tree above and the typed syntax tree that
swiftc -dump-ast
produces:(source_file (top_level_code_decl (brace_stmt (call_expr type='()' location=hello.swift:3:1 range=[hello.swift:3:1 - line:3:22] nothrow arg_labels=_: (declref_expr type='(Any..., String, String) -> ()' location=hello.swift:3:1 range=[hello.swift:3:1 - line:3:1] decl=Swift.(file).print(_:separator:terminator:) function_ref=single) (tuple_shuffle_expr implicit type='(Any..., separator: String, terminator: String)' location=hello.swift:3:7 range=[hello.swift:3:6 - line:3:22] scalar_to_tuple elements=[-2, -1, -1] variadic_sources=[0] default_args_owner=Swift.(file).print(_:separator:terminator:) (paren_expr type='Any' location=hello.swift:3:7 range=[hello.swift:3:6 - line:3:22] (erasure_expr implicit type='Any' location=hello.swift:3:7 range=[hello.swift:3:7 - line:3:7] (string_literal_expr type='String' location=hello.swift:3:7 range=[hello.swift:3:7 - line:3:7] encoding=utf8 value="Hello, world!" builtin_initializer=Swift.(file).String.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:) initializer=**NULL**))))))))Specifically, the untyped tree features nodes such as
unresolved_decl_ref_expr
, and many nodes also have atype='<null>'
value. These are later filled in with type information as part of the type-checker, which is implemented in libswiftSema. I'll cover libswiftSema in a future article.
The first six stages of the Swift frontend: lexing & parsing
Just how does the swift
executable read the text in the hello.swift
file to construct an untyped syntax tree? How does it determine that print("Hello, world!")
is a call_expr
that wraps a paren_expr
that wraps a string_literal_expr
?
At a high level, the frontend completes parsing in about 6 stages:
- The
main
function inswift/tools/driver/driver.cpp
sees that the first argument it's been passed is-frontend
, and so it invokes the libswiftFrontendTool library'sperformFrontend
function. performFrontend
parses command-line arguments to determine theFrontendOptions::RequestedAction
, by using theCompilerInvocation::parseArgs
member function. It then instantiatesCompilerInstance
andASTContext
objects based on those arguments. Finally, it calls the libswiftFrontendTool functionperformCompile
.performCompile
uses the requestedFrontendOptions::RequestedAction
to determine whether to callCompilerInstance::performParseOnly
orCompilerInstance::performSema
. Our invocation ofswift -frontend -c
results in a requested action ofActionType::EmitObject
, and soCompilerInstance::performSema
is called.- The
CompilerInstance::performSema
member function calls through to several other member functions. It opens a bitstream cursor into the standard library moduleSwift.swiftmodule
(it'll use this cursor later, in order to determine the type of the parsedprint(...)
expression). It adds aSourceFile
node to the root of the AST. It also calls theparseIntoSourceFile
function, which instantiates aParser
and calls theParser::parseTopLevel
member function. This member function begins the process of lexing and parsing the text in thehello.swift
source file. - The
Parser
works in tandem with aLexer
, which it creates and stores internally as part of theParser
initializer. When theParser::parseTopLevel
member is called, that kicks off an endless loop that:- Lexes a "token". A token is a series of characters in the source file that form a cohesive unit. For example, the first token lexed in
hello.swift
is"print"
. That's because the lexer first sees a'p'
, determines that it must be the start of an identifier (i.e.: an alphabetical character, followed by any number of alphanumeric characters or underscores), and so continues to read in characters until it sees a'('
, at which point it stops. - The
Parser
looks at the token that theLexer
has lexed, and determines what to "parse." For example, when theParser
sees that theLexer
has lexed an identifier token ofprint
, it asks theLexer
to lex the next token. Because that next token is a'('
, it determines that it must be parsing a call expression. It asks theLexer
to lex the appropriate number of tokens in order to close theprint(...)
expression. Once it has, theParser
instantiates a new AST node:CallExpr
.
- Lexes a "token". A token is a series of characters in the source file that form a cohesive unit. For example, the first token lexed in
- The
Parser
andLexer
continue their endless loop, instantiating new AST nodes, such asCallExpr
orStringLiteralExpr
, and adding them to theASTContext
. Eventually they reach the end of the source file. Normally at this point libswiftFrontend would continue by kicking off the type-checker forhello.swift
. However, that's beyond the scope of this article, so we'll stop here.
The rest of this article steps through the code behind these six steps, explaining them in more detail.
Stage 1: Parsing the -frontend
argument
As explained in my libswiftDriver article, swift
is just a C++ executable. Any invocation of a C++ executable begins in its main
function. The swift
executable's main
function is defined in swift/tools/driver/driver.cpp
.
Recall that one of the first things the Swift compiler's main
function does is check for the first argument it's given. If that argument is -frontend
, it calls the performFrontend
function:
swift/tools/driver/driver.cpp
111 int main(int argc_, const char **argv_) { ... 158 StringRef FirstArg(argv[1]); 159 if (FirstArg == "-frontend") { 160 return performFrontend(llvm::makeArrayRef(argv.data()+2, 161 argv.data()+argv.size()), 162 argv[0], (void *)(intptr_t)getExecutablePath); 163 }
Note that -frontend
must be the first argument; invoking swift -c hello.swift -frontend
is not considered a frontend invocation, and the performFrontend
function will not be called. The machinery described in my previous article, Option Parsing in the Swift Compiler, hasn't been initialized yet, and so libswiftOption and libLLVMOption are not used here to check for swift::options::ID::OPT_frontend
. Instead, this is a naive string comparison.
Stage 2: Instantiating a CompilerInstance
based on command-line arguments
The performFrontend
function is part of the libswiftFrontendTool library. libswiftFrontendTool is a tiny Swift compiler library whose single purpose is to drive the frontend compilation process.
The build code for libswiftFrontendTool – which determines the files that are included in the library, how it is built, and which libraries it depends on – is all defined in CMake. Check out the
swift/lib/FrontendTool/CMakeLists.txt
file to learn more.There's nothing particularly exciting about the libswiftFrontendTool CMake, so I won't cover it here. A previous article, Reading and Understanding the CMake in apple/swift, has tips on how to read the CMake yourself.
The performFrontend
function instantiates a CompilerInstance
and a CompilerInvocation
, and then performs argument parsing:
swift/lib/FrontendTool/FrontendTool.cpp
1304 int swift::performFrontend(ArrayRef<const char *> Args, 1305 const char *Argv0, void *MainAddr, 1306 FrontendObserver *observer) { .... 1348 std::unique_ptr<CompilerInstance> Instance = 1349 llvm::make_unique<CompilerInstance>(); .... 1371 CompilerInvocation Invocation; .... 1379 // Parse arguments. 1380 if (Invocation.parseArgs(Args, Instance->getDiags(), workingDirectory)) { 1381 return finishDiagProcessing(1); 1382 } .... 1542 }
As I mentioned in previous articles, I'll elide source code that isn't relevant to understanding the topic at hand: how the Swift compiler parses Swift source code. Besdies what's shown above, the
performFrontend
function also calls functions likellvm::InitializeAllTargets
. If it didn't, the Swift compiler would not be able to generate object files. But this article isn't about generating object files, so I won't include that code in these snippets.
The CompilerInstance
class is arguably one of the most important classes in the Swift codebase. It's defined as part of the libswiftFrontend library. It's significant because it holds unique, owning references of several important singletons. For example, it holds a unique pointer to the ASTContext
, which is a singleton that stores all AST nodes that are created by the compiler.
swift/include/swift/Frontend/Frontend.h
304 /// A class which manages the state and execution of the compiler. 305 /// This owns the primary compiler singletons, such as the ASTContext, 306 /// as well as various build products such as the SILModule. 307 /// 308 /// Before a CompilerInstance can be used, it must be configured by 309 /// calling \a setup. If successful, this will create an ASTContext 310 /// and set up the basic compiler invariants. Calling \a setup multiple 311 /// times on a single CompilerInstance is not permitted. 312 class CompilerInstance { 313 CompilerInvocation Invocation; 314 SourceManager SourceMgr; 315 DiagnosticEngine Diagnostics{SourceMgr}; 316 std::unique_ptr<ASTContext> Context; 317 std::unique_ptr<SILModule> TheSILModule; ... 578 };
The performFrontend
function also instantiates a CompilerInvocation
object. Not to be confused with CompilerInstance
, the CompilerInvocation
class is practically a glorified bag of options. It's responsible for parsing command-line arguments to swift -frontend
, and storing their values in various "options" classes, like FrontendOptions
or LangOptions
.
swift/include/swift/Frontend/Frontend.h
53 /// The abstract configuration of the compiler, including: 54 /// - options for all stages of translation, 55 /// - information about the build environment, 56 /// - information about the job being performed, and 57 /// - lists of inputs. 58 /// 59 /// A CompilerInvocation can be built from a frontend command line 60 /// using parseArgs. It can then be used to build a CompilerInstance, 61 /// which manages the actual compiler execution. 62 class CompilerInvocation { 63 LangOptions LangOpts; 64 FrontendOptions FrontendOpts; 65 ClangImporterOptions ClangImporterOpts; 66 SearchPathOptions SearchPathOpts; 67 DiagnosticOptions DiagnosticOpts; 68 MigratorOptions MigratorOpts; 69 SILOptions SILOpts; 70 IRGenOptions IRGenOpts; .. 80 public: .. 83 /// Initializes the compiler invocation for the list of arguments. .. 92 /// \returns true if there was an error, false on success. 93 bool parseArgs(ArrayRef<const char *> Args, DiagnosticEngine &Diags, 94 StringRef workingDirectory = {}); .. 302 };
A CompilerInvocation
object, then, is basically just a collection of parameters that are used to initialize the all-important CompilerInstance
. The command-line arguments to swift -frontend
are translated into the appropriate settings on a CompilerInvocation
object, via its CompilerInvocation::parseArgs
member function.
2.1: Parsing frontend command-line arguments
If you've read Option Parsing in the Swift Compiler, the body of the CompilerInvocation::parseArgs
member function should look very familiar to you. It parses arguments using the exact same libswiftOption and libLLVMOption abstractions as the Swift driver does: createSwiftOptTable
and llvm::opt::OptTable::ParseArgs
.
swift/lib/Frontend/CompilerInvocation.cpp
962 bool CompilerInvocation::parseArgs(ArrayRef<const char *> Args, 963 DiagnosticEngine &Diags, 964 StringRef workingDirectory) { ... 973 std::unique_ptr<llvm::opt::OptTable> Table = createSwiftOptTable(); 974 llvm::opt::InputArgList ParsedArgs = 975 Table->ParseArgs(Args, MissingIndex, MissingCount, FrontendOption); ... 990 if (ParseFrontendArgs(FrontendOpts, ParsedArgs, Diags)) { 991 return true; 992 } 993 994 if (ParseLangArgs(LangOpts, ParsedArgs, Diags, FrontendOpts)) { 995 return true; 996 } ... 1030 return false; 1031 } 1032
After converting the command-line strings into an llvm::opt::InputArgList
via the llvm::opt::OptTable::ParseArgs
member function, the CompilerInvocation::parseArgs
member function calls functions like ParseFrontendArgs
, ParseLangArgs
, and so on, in order to set values on members such as CompilerInvocation::LangOpts
.
For example, ParseLangArgs
is responsible for translating the swift::options::ID::OPT_swift_version
stored on the llvm::opt::InputArgList
, and using that to set CompilerInvocation::LangOpts::EffectiveLanguageVersion
. If the version passed in is invalid, it emits an error diagnostic:
swift/lib/Frontend/CompilerInvocation.cpp
125 static bool ParseLangArgs(LangOptions &Opts, ArgList &Args, 126 DiagnosticEngine &Diags, 127 const FrontendOptions &FrontendOpts) { ... 136 if (auto A = Args.getLastArg(OPT_swift_version)) { 137 auto vers = version::Version::parseVersionString( 138 A->getValue(), SourceLoc(), &Diags); 139 bool isValid = false; 140 if (vers.hasValue()) { 141 if (auto effectiveVers = vers.getValue().getEffectiveLanguageVersion()) { 142 Opts.EffectiveLanguageVersion = effectiveVers.getValue(); 143 isValid = true; 144 } 145 } 146 if (!isValid) 147 diagnoseSwiftVersion(vers, A, Args, Diags); 148 } ... 355 }
To test this out, we can try passing
swift -frontend
an invalid language version, such asswift -frontend -c hello.swift -swift-version foo
. This outputs:<unknown>:0: error: version component contains non-numeric characters <unknown>:0: error: invalid value 'foo' in '-swift-version foo' <unknown>:0: note: valid arguments to '-swift-version' are '3', '4', '5'If you're curious what other combinations of Swift language versions are valid, you can take a look at the libswiftBasic functions that the
ParseLangArgs
function uses above:Version::parseVersionString
andVersion::getEffectiveLanguageVersion
.
One of the most important parts of this argument parsing is done in the ParseFrontendArgs
function. This calls through to the ArgsToFrontendConverter::determineRequestedAction
member function, in order to set the CompilerInvocation
object's FrontendOptions::RequestedAction
, based on whether the frontend was invoked with -emit-object
, -emit-sil
, or some other option. This "requested action" will determine what logic the frontend executes:
swift/lib/Frontend/ArgsToFrontendOptionsConverter.cpp
269 FrontendOptions::ActionType 270 ArgsToFrontendOptionsConverter::determineRequestedAction() const { 271 using namespace options; 272 const Arg *A = Args.getLastArg(OPT_modes_Group); ... 283 Option Opt = A->getOption(); 284 if (Opt.matches(OPT_emit_object)) 285 return FrontendOptions::ActionType::EmitObject; 286 if (Opt.matches(OPT_emit_assembly)) 287 return FrontendOptions::ActionType::EmitAssembly; 288 if (Opt.matches(OPT_emit_ir)) 289 return FrontendOptions::ActionType::EmitIR; ... 308 if (Opt.matches(OPT_dump_parse)) 309 return FrontendOptions::ActionType::DumpParse; 310 if (Opt.matches(OPT_dump_ast)) 311 return FrontendOptions::ActionType::DumpAST; ... 330 llvm_unreachable("Unhandled mode option"); 331 }
My invocation of swift -frontend -c hello.swift
does not appear to include the argument OPT_emit_object
. However, a quick peek at Options.td
(covered in depth in the article on Option Parsing in the Swift Compiler) reveals that -c
is an alias for -emit-object
:
swift/include/swift/Option/Options.td
571 def c : Flag<["-"], "c">, Alias<emit_object>, 572 Flags<[FrontendOption, NoInteractiveOption]>, ModeOpt;
2.2: Instantiating the ASTContext
via the CompilerInstance::setup
member function
After the arguments have been parsed, the performFrontend
function continues by finishing the initialization of the CompilerInstance
. To do so, it calls the member function CompilerInstance::setup
:
swift/lib/FrontendTool/FrontendTool.cpp
1304 int swift::performFrontend(ArrayRef<const char *> Args, 1305 const char *Argv0, void *MainAddr, 1306 FrontendObserver *observer) { .... 1348 std::unique_ptr<CompilerInstance> Instance = 1349 llvm::make_unique<CompilerInstance>(); .... 1371 CompilerInvocation Invocation; .... 1379 // Parse arguments. 1380 if (Invocation.parseArgs(Args, Instance->getDiags(), workingDirectory)) { 1381 return finishDiagProcessing(1); 1382 } .... 1464 if (Instance->setup(Invocation)) { 1465 return finishDiagProcessing(1); 1466 } .... 1542 }
CompilerInstance::setup
allocates a new ASTContext
. ASTContext
is responsible for creating, allocating memory for, and owning the nodes of the syntax tree.
swift/lib/Frontend/Frontend.cpp
76 bool CompilerInstance::setup(const CompilerInvocation &Invok) { 77 Invocation = Invok; .. 93 Context.reset(new ASTContext(Invocation.getLangOptions(), 94 Invocation.getSearchPathOptions(), SourceMgr, 95 Diagnostics)); 96 .. 106 }
There's no overstating the importance of the ASTContext
class; if you do any work on the Swift compiler, you'll see this class used everwhere.
swift/include/swift/AST/ASTContext.h
178 /// ASTContext - This object creates and owns the AST objects. 179 /// However, this class does more than just maintain context within an AST. 180 /// It is the closest thing to thread-local or compile-local storage in this 181 /// code base. Why? SourceKit uses this code with multiple threads per Unix 182 /// process. Each thread processes a different source file. Each thread has its 183 /// own instance of ASTContext, and that instance persists for the duration of 184 /// the thread, throughout all phases of the compilation. (The name "ASTContext" 185 /// is a bit of a misnomer here.) Why not use thread-local storage? This code 186 /// may use DispatchQueues and pthread-style TLS won't work with code that uses 187 /// DispatchQueues. Summary: if you think you need a global or static variable, 188 /// you probably need to put it here instead. 189 190 class ASTContext { ... 924 };
I'll write more on ASTContext
below. For now, let's return to performFrontend
. Having called CompilerInstance::setup
, it calls performCompile
.
swift/lib/FrontendTool/FrontendTool.cpp
1304 int swift::performFrontend(ArrayRef<const char *> Args, 1305 const char *Argv0, void *MainAddr, 1306 FrontendObserver *observer) { .... 1348 std::unique_ptr<CompilerInstance> Instance = 1349 llvm::make_unique<CompilerInstance>(); .... 1371 CompilerInvocation Invocation; .... 1379 // Parse arguments. 1380 if (Invocation.parseArgs(Args, Instance->getDiags(), workingDirectory)) { 1381 return finishDiagProcessing(1); 1382 } .... 1464 if (Instance->setup(Invocation)) { 1465 return finishDiagProcessing(1); 1466 } .... 1508 int ReturnValue = 0; 1509 bool HadError = 1510 performCompile(*Instance, Invocation, Args, ReturnValue, observer, 1511 StatsReporter.get()); .... 1542 }
Stage 3: Kicking off libswiftParse (and libswiftSema)
The performCompile
function (also defined as part of the libswiftFrontendTool library) looks at the FrontendOptions::RequestedAction
(set in stage 2.1 above) and calls either CompilerInstance::performParseOnly
or CompilerInstance::performSema
.
swift/lib/FrontendTool/FrontendTool.cpp
528 static bool performCompile(CompilerInstance &Instance, 529 CompilerInvocation &Invocation, 530 ArrayRef<const char *> Args, 531 int &ReturnValue, 532 FrontendObserver *observer, 533 UnifiedStatsReporter *Stats) { 534 FrontendOptions opts = Invocation.getFrontendOptions(); 535 FrontendOptions::ActionType Action = opts.RequestedAction; ... 608 if (Action == FrontendOptions::ActionType::Parse || 609 Action == FrontendOptions::ActionType::DumpParse || 610 Action == FrontendOptions::ActionType::EmitSyntax || 611 Action == FrontendOptions::ActionType::DumpInterfaceHash || 612 Action == FrontendOptions::ActionType::EmitImportedModules) 613 Instance.performParseOnly(); 614 else 615 Instance.performSema(); ... 857 }
As explained in stage 2.1 above, when I invoke swift -frontend -c hello.swift
, it results in a request for ActionType::EmitObject
, so the CompilerInstance::performSema
member function is called.
Stage 4: Opening a Swift.swiftmodule
bitstream cursor and kicking off the parsing loop
The CompilerInstance::performSema
member function loads the Swift standard library and then calls through to CompilerInstance::parseAndCheckTypes
:
swift/lib/Frontend/Frontend.cpp
369 void CompilerInstance::performSema() { ... 382 if (!loadStdlib()) 383 return; ... 398 if (MainBufferID != NO_SUCH_BUFFER) 399 addMainFileToModule(implicitImports); 400 401 parseAndCheckTypes(implicitImports); 402 }
The CompilerInstance::loadStdlib
member function opens a cursor into the Swift standard library module file. Swift module files are Swift ASTs, serialized into a binary format called an LLVM bitstream. Later, when libswiftSema type-checks the hello.swift
file, it will lookup the print
function using the cursor created here.
This article only covers parsing, not type-checking. I'll write more about the
CompilerInstance::loadStdlib
member function, as well as about Swift modules in general, in a future article, which I'll publish before writing about the Swift type-checker.
Next, CompilerInstance::performSema
calls CompilerInstance::addFileToMainModule
, which calls through to CompilerInstance::createSourceFileForMainModule
.
swift/lib/Frontend/Frontend.cpp
744 SourceFile *CompilerInstance::createSourceFileForMainModule( 745 SourceFileKind fileKind, SourceFile::ImplicitModuleImportKind importKind, 746 Optional<unsigned> bufferID) { ... 749 SourceFile *inputFile = new (*Context) 750 SourceFile(*mainModule, fileKind, bufferID, importKind, keepSyntaxInfo); 751 MainModule->addFile(*inputFile); ... 757 return inputFile; 758 }
This creates the SourceFile
node at the root of the AST that is printed out when invoking swiftc -dump-parse hello.swift
. The SourceFile
class derives from the DeclContext
class, which is used in the AST code as a container for arbitrary declarations.
Back in CompilerInstance::performSema
, a call to CompilerInstance::parseAndCheckTypes
is made, which then calls through to CompilerInstance::parseAndTypeCheckMainFile
. This calls the parseIntoSourceFile
function multiple times – as many times as that function continues to find Swift code at the top level of the file to parse, until it hits the end of the file.
swift/lib/Frontend/Frontend.cpp
659 void CompilerInstance::parseAndTypeCheckMainFile( 660 PersistentParserState &PersistentState, 661 DelayedParsingCallbacks *DelayedParseCB, 662 OptionSet<TypeCheckingFlags> TypeCheckOptions) { ... 677 bool Done; 678 do { ... 683 parseIntoSourceFile(MainFile, MainFile.getBufferID().getValue(), &Done, 684 TheSILModule ? &SILContext : nullptr, &PersistentState, 685 DelayedParseCB); ... 695 } while (!Done); ... 713 }
The parseIntoSourceFile
function is defined as part of a tiny compiler library named libswiftParseSIL (the function was recently split out into its own library in order to break circular library dependencies in the compiler). It instantiates a Parser
and calls Parser::parseTopLevel
. Note that it sets the Done
pointer based on whether it's found the eof
"token":
swift/lib/ParseSIL/ParseSIL.cpp
101 bool swift::parseIntoSourceFile(SourceFile &SF, 102 unsigned BufferID, 103 bool *Done, 104 SILParserState *SIL, 105 PersistentParserState *PersistentState, 106 DelayedParsingCallbacks *DelayedParseCB) { ... 108 Parser P(BufferID, SF, SIL ? SIL->Impl.get() : nullptr, PersistentState); ... 116 bool FoundSideEffects = P.parseTopLevel(); 117 *Done = P.Tok.is(tok::eof); 118 119 return FoundSideEffects; 120 }
4.1: More on the Lexer
and tokens
The Parser
initializer that's used in the parseIntoSourceFile
function creates a new Lexer
object:
swift/lib/Parse/Parser.cpp
329 Parser::Parser(unsigned BufferID, SourceFile &SF, SILParserTUStateBase *SIL, 330 PersistentParserState *PersistentState) 331 : Parser( 332 std::unique_ptr<Lexer>(new Lexer( 333 SF.getASTContext().LangOpts, SF.getASTContext().SourceMgr, 334 BufferID, &SF.getASTContext().Diags, 335 /*InSILMode=*/SIL != nullptr, 336 SF.getASTContext().LangOpts.AttachCommentsToDecls 337 ? CommentRetentionMode::AttachToNextToken 338 : CommentRetentionMode::None, 339 SF.shouldKeepSyntaxInfo() 340 ? TriviaRetentionMode::WithTrivia 341 : TriviaRetentionMode::WithoutTrivia)), 342 SF, SIL, PersistentState) {}
A Lexer
is responsible for reading in the individual characters in a source file and forming logical chunks, called tokens.
The Clang compiler, which compiles C, C++, and Objective-C source code, can print the tokens it lexes, using the -dump-tokens
frontend option. For example, consider the following simple C program hello.c
:
int main() { return 0; }
I can invoke the Clang frontend, clang -cc1
, to dump the tokens in this file (Clang has a driver and frontend system that is nearly identical to Swift's, except that Clang takes the argument -cc1
instead of -frontend
). clang -cc1 -dump-tokens hello.c
outputs the following:
int 'int' Loc=<hello.c:1:1> [StartOfLine] identifier 'main' Loc=<hello.c:1:5> [LeadingSpace] l_paren '(' Loc=<hello.c:1:9> r_paren ')' Loc=<hello.c:1:10> l_brace '{' Loc=<hello.c:1:12> [LeadingSpace] return 'return' Loc=<hello.c:2:3> [StartOfLine] [LeadingSpace] numeric_constant '0' Loc=<hello.c:2:10> [LeadingSpace] semi ';' Loc=<hello.c:2:11> r_brace '}' Loc=<hello.c:3:1> [StartOfLine] eof '' Loc=<hello.c:3:2>
The Swift compiler executable does not have an option to print the tokens in a file (although if any readers are interested in contributing, this would be a great feature to add!), but if it did, the tokens in hello.swift
would be output like this:
identifier 'print' Loc=<hello.swift:3:1> [StartOfLine] l_paren '(' Loc=<hello.swift:3:6> string_literal '"Hello, world!"' Loc=<hello.swift:3:7> r_paren ')' Loc=<hello.swift:3:22> eof '' Loc=<hello.swift:3:23>
Note that the comment at the top of the file,
// hello.swift
, and the empty line below that comment, are not represented as tokens. The Swift compiler can be invoked such that it creates tokens for comments and whitespace, but normally they are discarded entirely by the compiler.
The first column of the output above displays the token "kind". Kinds of Swift tokens include identifier
, l_paren
, and eof
. You can find a list of all the different Swift token kinds in swift/include/swift/Syntax/TokenKinds.def
. An enum of all the different token kinds is defined in swift/include/swift/Syntax/TokenKinds.h
, using a trick readers of my option parsing article should be familiar with: it defines the TOKEN
macro, and then includes the TokenKinds.def
file, which contais a call to TOKEN
for each token kind.
swift/include/swift/Syntax/TokenKinds.h
21 enum class tok { 22 #define TOKEN(X) X, 23 #include "swift/Syntax/TokenKinds.def" 24 25 NUM_TOKENS 26 };
This creates an enum case for, for example, the if
keyword. In that case, the enum case is named tok::kw_if
:
swift/include/swift/Syntax/TokenKinds.def
39 /// KEYWORD(kw) 40 /// Expands by default for every Swift keyword and every SIL keyword, such as 41 /// 'if', 'else', 'sil_global', etc. If you only want to use Swift keywords 42 /// see SWIFT_KEYWORD. 43 #ifndef KEYWORD 44 #define KEYWORD(kw) TOKEN(kw_ ## kw) 45 #endif 46 47 /// SWIFT_KEYWORD(kw) 48 /// Expands for every Swift keyword. 49 #ifndef SWIFT_KEYWORD 50 #define SWIFT_KEYWORD(kw) KEYWORD(kw) 51 #endif .. 59 /// STMT_KEYWORD(kw) 60 /// Expands for every Swift keyword used in statement grammar. 61 #ifndef STMT_KEYWORD 62 #define STMT_KEYWORD(kw) SWIFT_KEYWORD(kw) 63 #endif ... 169 STMT_KEYWORD(if)
The Token
class stores information about a token: its kind and its text. It also defines member functions such as Token::is
, so that other parts of the compiler can quickly check "is this token an if
keyword?", by invoking Tok.is(tok::kw_if)
. Or, to check simply that the token is a keyword, the Token::isKeyword
member function is implemented using a macro and an include:
swift/include/swift/Parse/Token.h
33 class Token { 34 /// Kind - The actual flavor of token this is. 35 /// 36 tok Kind; 51 52 /// Text - The actual string covered by the token in the source buffer. 53 StringRef Text; .. 61 public: .. 76 /// is/isNot - Predicates to check if this token is a specific kind, as in 77 /// "if (Tok.is(tok::l_brace)) {...}". 78 bool is(tok K) const { return Kind == K; } ... 214 /// True if the token is any keyword. 215 bool isKeyword() const { 216 switch (Kind) { 217 #define KEYWORD(X) case tok::kw_##X: return true; 218 #include "swift/Syntax/TokenKinds.def" 219 default: return false; 220 } 221 } ... 302 };
4.2: Priming the Lexer
to form a token for the identifier "print"
Normally the Parser
prompts the Lexer
to lex the next token in a file. The Lexer
also, when it's initialized, lexes the first token in the file by calling the Lexer::primeLexer
member function.
swift/include/swift/Parse/Lexer.h
64 class Lexer { ... 169 Lexer(const LangOptions &Options, 170 const SourceManager &SourceMgr, unsigned BufferID, 171 DiagnosticEngine *Diags, bool InSILMode, 172 CommentRetentionMode RetainComments = CommentRetentionMode::None, 173 TriviaRetentionMode TriviaRetention = TriviaRetentionMode::WithoutTrivia) 174 : Lexer(Options, SourceMgr, Diags, BufferID, InSILMode, RetainComments, 175 TriviaRetention) { 176 primeLexer(); 177 } ... 523 };
Lexer::primeLexer
calls through to Lexer::lexImpl
. This member function implements the core of the lexing functionality in the Swift compiler; it is the method that, based on the first character in a series, determines whether to lex an identifier, a number literal, an operator, or some other token.
The Lexer
keeps a pointer to the character it's currently lexing using a member Lexer::CurPtr
. Lexer::lexImpl
moves the pointer to the next character and then enters a switch
statement based on the character's value. Here are the switch
statement cases that are relevant to my source file hello.swift
:
swift/lib/Parse/Lexer.cpp
2041 void Lexer::lexImpl() { .... 2068 Restart: .... 2074 switch ((signed char)*CurPtr++) { .... 2150 case ' ': 2151 case '\t': 2152 case '\f': 2153 case '\v': 2154 goto Restart; // Skip whitespace. .... 2193 case '(': return formToken(tok::l_paren, TokStart); 2194 case '}': return formToken(tok::r_brace, TokStart); 2195 case ']': return formToken(tok::r_square, TokStart); 2196 case ')': 2197 return formToken(tok::r_paren, TokStart); .... 2209 case '/': 2210 if (CurPtr[0] == '/') { // "//" 2211 skipSlashSlashComment(/*EatNewline=*/true); .... 2215 goto Restart; 2216 } .... 2264 case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G': 2265 case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': 2266 case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T': case 'U': 2267 case 'V': case 'W': case 'X': case 'Y': case 'Z': 2268 case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g': 2269 case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n': 2270 case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u': 2271 case 'v': case 'w': case 'x': case 'y': case 'z': 2272 case '_': 2273 return lexIdentifier(); .... 2282 case '"': 2283 case '\'': 2284 return lexStringLiteral(); .... 2289 }
When the Lexer
is first initialized in the frontend when parsing hello.swift
, the Lexer::primeLexer
member function is called, which calls Lexer::lexImpl
. This skips the //
comments and whitespace at the beginning of hello.swift
, until it reaches the 'p'
character in "print"
, and so it enters the case in which the Lexer::lexIdentifier
member function is called. This function consumes characters as long as they are valid identifier characters, and then creates a Token
with a token kind of either tok::identifier
, or a keyword token kind like tok::kw_if
.
swift/lib/Parse/Lexer.cpp
591 /// lexIdentifier - Match [a-zA-Z_][a-zA-Z_$0-9]* 592 void Lexer::lexIdentifier() { 593 const char *TokStart = CurPtr-1; 594 CurPtr = TokStart; 595 bool didStart = advanceIfValidStartOfIdentifier(CurPtr, BufferEnd); ... 599 // Lex [a-zA-Z_$0-9[[:XID_Continue:]]]* 600 while (advanceIfValidContinuationOfIdentifier(CurPtr, BufferEnd)); 601 602 tok Kind = kindOfIdentifier(StringRef(TokStart, CurPtr-TokStart), InSILMode); 603 return formToken(Kind, TokStart); 604 }
In the case of hello.swift
, Lexer::lexIdentifier
would advance, from 'p'
, and end at "print"
, stopping when it saw the '('
character immediately after "print"
. It would then call Lexer::kindOfIdentifier
, to determine whether "print"
was a keyword. The Lexer::kindOfIdentifier
member function uses, sure enough, the macro and include trick:
swift/lib/Parse/Lexer.cpp
570 tok Lexer::kindOfIdentifier(StringRef Str, bool InSILMode) { 571 tok Kind = llvm::StringSwitch<tok>(Str) 572 #define KEYWORD(kw) \ 573 .Case(#kw, tok::kw_##kw) 574 #include "swift/Syntax/TokenKinds.def" 575 .Default(tok::identifier); ... 588 return Kind; 589 }
So, as soon as the Lexer
is initialized, the Lexer::primeLexer
member function results in the Lexer::NextToken
member being set to a token with kind tok::identifier
, and with the text "print"
.
The next time the Parser
requests a token from its Lexer
, the Lexer
will immediately return to it this "print"
token and, if it is not at the end of the file, it'll prepare the next Lexer::NextToken
in the same way, by calling Lexer::lexImpl
again.
Stage 5: The lex & parse loop
To recap: the libswiftFrontendTool function performFrontend
parsed the arguments passed into our swift -frontend
invocation and determined it would call the CompilerInstance::performSema
member function. That eventually called the parseIntoSourceFile
function, which instantiated a Parser
and its internal Lexer
(which resulted in Lexer
being "primed" with the "print"
token). Finally, parseIntoSourceFile
calls Parser::parseTopLevel
, which kicks off the lexing and parsing loop that results in the full syntax tree being constructed for our file.
swift/lib/Parse/ParseDecl.cpp
189 bool Parser::parseTopLevel() { ... 194 consumeTokenWithoutFeedingReceiver(); ... 234 parseBraceItems(Items, 235 allowTopLevelCode() ? BraceItemListKind::TopLevelCode 236 : BraceItemListKind::TopLevelLibrary); ... 281 }
The Parser::parseTopLevel
member function first requests the next token from the Lexer
, by calling consumeTokenWithoutFeedingReceiver
. This sets member Parser::Tok
to the token for "print"
, and has the Lexer
ready the next token, which is an tok::l_paren
'('
.
It then calls Parser::parseBraceItems
. "Brace items" here refers to a sequence of expressions or statements that are contained with a pair of braces { ... }
. Our hello.swift
file does not contain braces, but when parsing top-level code in a file the Swift parser pretends that there are a set of braces wrapping that top level code.
Parser::parseBraceItems
enters a while loop that continues to parse the file until it reaches the file's end. Within that loop, it creates AST nodes for expressions or statements that it finds, by calling Parser::parseExprOrStmt
. If it finds an expression or statement, it wraps it in a BraceStmt
node:
swift/lib/Parse/ParseStmt.cpp
226 ParserStatus Parser::parseBraceItems(SmallVectorImpl<ASTNode> &Entries, 227 BraceItemListKind Kind, 228 BraceItemListKind ConditionalBlockKind) { ... 250 while ((IsTopLevel || Tok.isNot(tok::r_brace)) && ... 254 Tok.isNot(tok::eof) && ... 263 !isTerminatorForBraceItemListKind(Kind, Entries))) { ... 345 // If this is a statement or expression at the top level of the module, 346 // Parse it as a child of a TopLevelCodeDecl. 347 auto *TLCD = new (Context) TopLevelCodeDecl(CurDeclContext); ... 357 ParserStatus Status = parseExprOrStmt(Result); ... 373 if (!Result.isNull()) { 374 // NOTE: this is a 'virtual' brace statement which does not have 375 // explicit '{' or '}', so the start and end locations should be 376 // the same as those of the result node 377 auto Brace = BraceStmt::create(Context, Result.getStartLoc(), 378 Result, Result.getEndLoc()); 379 TLCD->setBody(Brace); ... 381 } ... 424 } ... 427 }
With the creation of a TopLevelCodeDecl
and a BraceStmt
to wrap the result of the Parser::parseExprOrStmt
member function call, the AST for hello.swift
would look like this:
(source_file (top_level_code_decl (brace_stmt ...)))
The actual expression print("Hello, world!")
will be parsed as part of the call to Parser::parseExprOrStmt
.
5.1: A quick note on custom allocators in C++
Before getting into the Parser::parseExprOrStmt
member function, take a closer look at the instantiation of TopLevelCodeDecl
, which uses an interesting C++ feature:
swift/lib/Parse/ParseStmt.cpp
226 ParserStatus Parser::parseBraceItems(SmallVectorImpl<ASTNode> &Entries, 227 BraceItemListKind Kind, 228 BraceItemListKind ConditionalBlockKind) { ... 347 auto *TLCD = new (Context) TopLevelCodeDecl(CurDeclContext); ... 427 }
I didn't have any experience writing C++ before I began working on Swift and Clang, so this call to new (Context) TopLevelCodeDecl(...)
confused me. I was familiar with expressions such as new Foo()
, which allocate memory for an instance of Foo
. But the expression new (Context) TopLevelCodeDecl(...)
seemed to have an extra element: what is (Context)
here?
It turns out that C++ allows you to provide overrides of the new
operator for specific classes, and those overrides can take additional parameters. The new
operator's first argument must be a size_t
that indicates how many bytes should be allocated, but beyond that you can define an arbitrary list of parameters. Here, Context
is an argument being passed to new
.
Swift's Decl
class not only defines a custom new
operator that takes an ASTContext
argument, it also deletes the default new
operator implementation:
swift/include/swift/AST/Decl.h
235 /// Decl - Base class for all declarations in Swift. 236 class alignas(1 << DeclAlignInBits) Decl { 237 protected: 865 866 // Make vanilla new/delete illegal for Decls. 867 void *operator new(size_t Bytes) = delete; ... 870 // Only allow allocation of Decls using the allocator in ASTContext 871 // or by doing a placement new. 872 void *operator new(size_t Bytes, const ASTContext &C, 873 unsigned Alignment = alignof(Decl)); ... 878 };
This means that you cannot allocate memory for a Decl
by calling new Decl()
– you must call new (Context) Decl()
. Doing so calls the ASTContext::Allocate
member function:
swift/lib/AST/Decl.cpp
97 // Only allow allocation of Decls using the allocator in ASTContext. 98 void *Decl::operator new(size_t Bytes, const ASTContext &C, 99 unsigned Alignment) { 100 return C.Allocate(Bytes, Alignment); 101 }
You may also have noticed this syntax earlier in this article, when CompilerInstance::performSema
called through to a function that created the SourceFile
root node in the AST. The SourceFile
class inherits from DeclContext
, which declares its own overload of the new
operator:
swift/include/swift/AST/DeclContext.h
186 class alignas(1 << DeclContextAlignInBits) DeclContext { ... 554 // Only allow allocation of DeclContext using the allocator in ASTContext. 555 void *operator new(size_t Bytes, ASTContext &C, 556 unsigned Alignment = alignof(DeclContext)); ... 560 };
This also calls through to the ASTContext::Allocate
function:
swift/lib/AST/DeclContext.cpp
38 // Only allow allocation of DeclContext using the allocator in ASTContext. 39 void *DeclContext::operator new(size_t Bytes, ASTContext &C, 40 unsigned Alignment) { 41 return C.Allocate(Bytes, Alignment); 42 }
The ASTContext::Allocate
member function is too interesting to explain in detail here. I'll write about it in a future article.
5.2: Parsing the print(...)
expression
To parse the expression or statement at the top level of my hello.swift
file, the Parser
must first determine whether it's an expression or a statement. It makes use of a helper function Parser::isStartOfStmt
to do so. For the most part, this just checks for keywords that clearly demarcate a statement:
swift/lib/Parse/ParseStmt.cpp
37 bool Parser::isStartOfStmt() { 38 switch (Tok.getKind()) { 39 default: return false; 40 case tok::kw_return: 41 case tok::kw_throw: 42 case tok::kw_defer: 43 case tok::kw_if: 44 case tok::kw_guard: 45 case tok::kw_while: 46 case tok::kw_do: 47 case tok::kw_repeat: 48 case tok::kw_for: 49 case tok::kw_break: 50 case tok::kw_continue: 51 case tok::kw_fallthrough: 52 case tok::kw_switch: 53 case tok::kw_case: 54 case tok::kw_default: 55 case tok::pound_if: 56 case tok::pound_sourceLocation: 57 return true; .. 85 } 86 }
Using this function, Parser::parseExprOrStmt
determines that print(...)
is an expression, and so it calls Parser::parseExpr
:
swift/lib/Parse/ParseStmt.cpp
88 ParserStatus Parser::parseExprOrStmt(ASTNode &Result) { .. 96 if (isStartOfStmt()) { 97 ParserResult<Stmt> Res = parseStmt(); 98 if (Res.isNonNull()) 99 Result = Res.get(); 100 return Res; 101 } ... 117 ParserResult<Expr> ResultExpr = parseExpr(diag::expected_expr); 118 if (ResultExpr.isNonNull()) { 119 Result = ResultExpr.get(); 120 } else if (!ResultExpr.hasCodeCompletion()) { ... 127 } ... 133 return ResultExpr; 134 } 135
Parsing an expression in Swift is complicated, because expressions can be composed of sequence of expressions of arbitrary length. For example, a && (b || (c && d))
is an expression itself, but (c && d)
and (b || (c && d)
are also expressions on their own.
I'll skip over some of the complexity that this introduces in order to showcase just the path that the code takes when parsing our single element expression sequence print("Hello, world!")
. Suffice it to say that Parser::parseExpr
eventually calls Parser::parseExprPostfixWithoutSuffix
, a function that enters a large switch
statement on the current token kind: for example, if the current token is a string literal, it calls parseExprStringLiteral
.
swift/lib/Parse/ParseExpr.cpp
1463 ParserResult<Expr> 1464 Parser::parseExprPostfixWithoutSuffix(Diag<> ID, bool isExprBasic) { .... 1466 ParserResult<Expr> Result; 1467 switch (Tok.getKind()) { .... 1498 case tok::string_literal: // "foo" 1499 Result = parseExprStringLiteral(); 1500 break; .... 1562 case tok::identifier: // foo .... 1598 LLVM_FALLTHROUGH; 1599 case tok::kw_self: // self 1600 case tok::kw_Self: // Self 1601 Result = makeParserResult(parseExprIdentifier()); 1602 1603 // If there is an expr-call-suffix, parse it and form a call. 1604 if (Tok.isFollowingLParen()) { 1605 Result = parseExprCallSuffix(Result, isExprBasic); .... 1607 break; 1608 } .... 1610 break; .... 1883 } .... 1885 return Result; 1886 }
In the case of print(...)
, the Parser::parseExprPostfixWithoutSuffix
function enters the tok::identifier
case. In this case, it falls through to the same case as for self
or Self
, and so it parses the identifier with Parser::parseExprIdentifier
. Then, if the next token is a left parentheses '('
, it parses a call expression by calling Parser::parseExprCallSuffix
.
The Parser::parseExprIdentifier
member function attempts to lookup the identifer name in the current scope and, if it can't find a definition, it creates an UnqualifiedDeclRefExpr
node in the AST:
swift/lib/Parse/ParseExpr.cpp
2218 Expr *Parser::parseExprIdentifier() { .... 2249 ValueDecl *D = nullptr; .... 2251 D = lookupInScope(name); .... 2278 Expr *E; 2279 if (D == nullptr) { .... 2286 E = new (Context) UnresolvedDeclRefExpr(name, refKind, loc); .... } 2307 return E; 2308 }
An UnqualifiedDeclRefExpr
is an expression that references an "unqualified" identifier. The identifier might not have been defined yet, or it may never be defined and so this code will fail during type-checking, or the identifier could have been defined in a different module.
In this case, print
is defined in a different module: it's defined in Swift.swiftmodule
, the Swift standard library module. As a result, any references to print
will be treated as an UnqualifiedDeclRefExpr
, at least until the type-checker runs and the print
identifier is read from the Swift module.
As explained above Parser::parseExprPostfixWithoutSuffix
parses the print
identifier, then checks for an opening parenthesis '('
. If one exists, it calls Parser::parseExprCallSuffix
.
Just as with parsing expression sequences, parsing arguments to a call expression is complicated. Swift functions can take an arbitrary number of expressions as arguments, and each of those arguments can itself be a sequence of expressions. I won't go into the details here; the basic idea is that Parser::parseExprList
parses a list of expressions that are passed into a call to CallExpr::create
:
swift/lib/Parse/ParseExpr.cpp
3223 ParserResult<Expr> 3224 Parser::parseExprCallSuffix(ParserResult<Expr> fn, bool isExprBasic) { 3227 3256 ParserStatus status = parseExprList(tok::l_paren, tok::r_paren, 3257 /*isPostfix=*/true, isExprBasic, 3258 lParenLoc, args, argLabels, 3259 argLabelLocs, 3260 rParenLoc, 3261 trailingClosure, 3262 SyntaxKind::FunctionCallArgumentList); 3263 3265 auto Result = makeParserResult(status | fn, 3266 CallExpr::create(Context, fn.get(), lParenLoc, 3267 args, argLabels, argLabelLocs, 3268 rParenLoc, trailingClosure, 3269 /*implicit=*/false)); 3278 return Result; 3279 }
In the case of hello.swift
, Parser::parseExprList
only finds a single expression. It eventually calls through to the string_literal
case in the Parser::parseExprPostfixWithoutSuffix
function posted above, which results in a call to Parser::parseExprStringLiteral
and, in turn, the createStringLiteralExprFromSegment
function:
swift/lib/Parse/ParseExpr.cpp
1888 static StringLiteralExpr * 1889 createStringLiteralExprFromSegment(ASTContext &Ctx, 1890 const Lexer *L, 1891 Lexer::StringSegment &Segment, 1892 SourceLoc TokenLoc) { .... 1902 return new (Ctx) StringLiteralExpr(EncodedStr, TokenLoc); 1903 }
The Parser::parseExprList
function allocates a ParenExpr
. Whatever expresions it parses, such as the StringLiteralExpr
above, are nested within the ParenExpr
.
All told, the call to Parser::parseExprOrStmt
results in the following nodes being added to the AST:
(call_expr type='<null>' arg_labels=_: (unresolved_decl_ref_expr type='<null>' name=print function_ref=unapplied) (paren_expr type='<null>' (string_literal_expr type='<null>' encoding=utf8 value="Hello, world!" builtin_initializer=**NULL** initializer=**NULL**)))
As shown above, the CallExpr
result is then nested within the BraceStmt
that is created within the Parser::parseBraceItems
member function. So, in sum, the AST looks like this:
(source_file (top_level_code_decl (brace_stmt (call_expr type='<null>' arg_labels=_: (unresolved_decl_ref_expr type='<null>' name=print function_ref=unapplied) (paren_expr type='<null>' (string_literal_expr type='<null>' encoding=utf8 value="Hello, world!" builtin_initializer=**NULL** initializer=**NULL**))))))
Stage 6: Reaching the end of the file
When Parser::parseExprList
is called, the Parser
tasks the Lexer
with consuming each token within the parentheses that come after print
. The parsing and lexing stops at ')'
, at which point the Lexer::NextToken
is set to tok::eof
– the end of the file.
Recall that all this parsing was occuring within a while loop in Parser::parseBraceItems
. One of the termination conditions for that while loop was encountering an EOF marker. Now that the end of the file has been reached, Parser::parseBraceItems
returns control back to Parser::parseTopLevel
, which returns control back to parseIntoSourceFile
, which returns back to CompilerInstance::parseAndTypeCheckMainFile
. The stage is now set for type-checking, which is done by calling the performTypeChecking
function:
swift/lib/Frontend/Frontend.cpp
659 void CompilerInstance::parseAndTypeCheckMainFile( 660 PersistentParserState &PersistentState, 661 DelayedParsingCallbacks *DelayedParseCB, 662 OptionSet<TypeCheckingFlags> TypeCheckOptions) { ... 677 bool Done; 678 do { ... 683 parseIntoSourceFile(MainFile, MainFile.getBufferID().getValue(), &Done, 684 TheSILModule ? &SILContext : nullptr, &PersistentState, 685 DelayedParseCB); ... 688 performTypeChecking(MainFile, PersistentState.getTopLevelContext(), 689 TypeCheckOptions, CurTUElem, 690 options.WarnLongFunctionBodies, 691 options.WarnLongExpressionTypeChecking, 692 options.SolverExpressionTimeThreshold); ... 695 } while (!Done); ... 713 }
I'll cover type-checking in a future article.
Recap: lexing and parsing in the Swift compiler
This article covered a lot of ground:
- It explained that Swift's
main
function determines whether to callperformFrontend
by checking for the presence of the-frontend
argument. - If the
-frontend
argument is provided,performFrontend
parses the other arguments passed intoswift -frontend
. It determines the requested action based on those arguments, and instantiates aCompilerInstance
andASTContext
to carry out that action. If compilation is requested, it callsperformCompile
. performCompile
checks the requested action and decides whether to callCompilerInstance::performSema
(parsing and type-checking), orCompilerInstance::performParseOnly
(just parsing). Our invocation ofswift -frontend -c
results in a requested action ofActionType::EmitObject
, and soCompilerInstance::performSema
is called.- A
Parser
is initialized along with itsLexer
. ALexer
reads characters and transforms strings of them intoToken
objects. AParser
drives theLexer
, based on the tokens its seeing and its knowledge of Swift syntax. The parsing process begins whenParser::parseTopLevel
is called.UnresolvedDeclRefExpr
. It sees an opening parenthesis immediately following the identifier, so it wraps the expression in aCallExpr
. The arguments to the call expression are parsed: a singleStringLiteralExpr
, wrapped in aParenExpr
. - The end of the file is reached, and so control goes back to
CompilerInstance::performSema
. Next stop: the type-checker!
To add new pieces of Swift syntax, or to modify existing Swift syntax, it's helpful to understand the libswiftParse source code. For example:
- The pull request that implemented mulitline string literals in Swift modified how the
Lexer
formed string literal tokens. - The pull request that implemented
#error
and#warning
in Swift addedtok::pound_warning
andtok::pound_error
to the list of token kinds inTokenKinds.def
. It then added theParser::parseDeclPoundDiagnostic
function. This function, unlike many of theParser
functions we covered above, does not create new AST nodes. Instead, it simply emits a warning or error when it seestok::pound_warning
ortok::pound_error
. Pretty clever, huh?
To learn more about how parsing works in the Swift compiler, try writing small programs and walking through the code in the compiler to see how they're parsed. A good way to do this is by attaching a debugger – read the instructions from the first article in this series, Getting Started with Swift Compiler Development, to learn how to do so. There's still a lot left to learn that I didn't cover in this article. Here's just two examples of parser mechanics that this article didn't cover:
- The parser is capable of "backtracking". That is, it can begin parsing tokens assuming a certain kind of statement or expression and, if it determines it is not actually parsing an expression of that kind, it can "undo" and reset back to where it started.
- The parser can conditionally parse portions of the program based on directives such as
#if
and#else
.