Getting Started with the Swift Frontend: Lexing & Parsing

A previous article in this series explained two primary ways of invoking the swift compiler executable: swift and swift -frontend.

  1. When invoking swift -frontend, the swift executable enters its main entry point and, once it sees the -frontend option, it begins to do everything you and I think of when we think of compilers: it attempts to lex the source file it's given, parse that file into a syntax tree, type-check it, produce an object file, and so on.
  2. When invoking just swift, without the -frontend option, the swift executable splits itself up into child invocations of swift -frontend. The logic that Swift uses to split itself up is in the libswiftDriver library.

Reading and Understanding the Swift Driver Source Code explains the libswiftDriver code that is executed in the second case. This article focuses on the first case: the "compiler-y" parts of the swift compiler executable.

In a nutshell, I aim to answer the question. "what happens when I compile this simple Swift program, hello.swift?"

hello.swift

1  // hello.swift
2
3  print("Hello, world!")

Parsing a Swift source file

First, I'll recap some details covered in previous articles. For example, I've explained that I can compile the hello.swift program on the command line, by invoking swiftc hello.swift. Even this simple invocation of swiftc, because it does not include the -frontend option, is split up into child jobs by the code in libswiftDriver. I can see these child jobs by invoking swiftc hello.swift -driver-print-jobs, which outputs something like the following:

swift -frontend \
    -c hello.swift \
    -o /tmp/hello.o
ld /tmp/hello.o \
    -lSystem -arch x86_64 -macosx_version_min 10.13.0 \
    -L /Users/bgesiak/Source/apple/build/Ninja-ReleaseAssert+swift-DebugAssert/swift-macosx-x86_64/lib/swift/macosx \
    -rpath /Users/bgesiak/Source/apple/build/Ninja-ReleaseAssert+swift-DebugAssert/swift-macosx-x86_64/lib/swift/macosx \
    -o hello

The first job invokes swift -frontend in order to produce an object file named hello.o, and the second job invokes the linker ld in order to link that object file into an executable named hello.

The first invocation appears to be very short, but make no mistake: it executes a lot of code, from a diverse set of libraries. These libraries include libswiftFrontend, libswiftParse, libswiftAST, libswiftSema, libswiftSIL, and more.

Covering each of these libraries in a single article would be exhausting. Instead, this article focuses on the first few phases of swift -frontend -c hello.swift. It'll cover libswiftFrontendTool, libswiftFrontend, and libswiftParse. These three libraries are used, in conjunction with libswiftAST, to build a tree structure that represents the untyped syntax tree of the Swift source file.

I can display the untyped syntax tree by invoking swiftc hello.swift -dump-parse, which outputs the following:

(source_file
  (top_level_code_decl
    (brace_stmt
      (call_expr type='<null>' arg_labels=_:
        (unresolved_decl_ref_expr type='<null>' name=print function_ref=unapplied)
        (paren_expr type='<null>'
          (string_literal_expr type='<null>' encoding=utf8 value="Hello, world!" builtin_initializer=**NULL** initializer=**NULL**))))))

Note the difference between the untyped tree above and the typed syntax tree that swiftc -dump-ast produces:

(source_file
  (top_level_code_decl
    (brace_stmt
      (call_expr type='()' location=hello.swift:3:1 range=[hello.swift:3:1 - line:3:22] nothrow arg_labels=_:
        (declref_expr type='(Any..., String, String) -> ()' location=hello.swift:3:1 range=[hello.swift:3:1 - line:3:1] decl=Swift.(file).print(_:separator:terminator:) function_ref=single)
        (tuple_shuffle_expr implicit type='(Any..., separator: String, terminator: String)' location=hello.swift:3:7 range=[hello.swift:3:6 - line:3:22] scalar_to_tuple elements=[-2, -1, -1] variadic_sources=[0] default_args_owner=Swift.(file).print(_:separator:terminator:)
          (paren_expr type='Any' location=hello.swift:3:7 range=[hello.swift:3:6 - line:3:22]
            (erasure_expr implicit type='Any' location=hello.swift:3:7 range=[hello.swift:3:7 - line:3:7]
              (string_literal_expr type='String' location=hello.swift:3:7 range=[hello.swift:3:7 - line:3:7] encoding=utf8 value="Hello, world!" builtin_initializer=Swift.(file).String.init(_builtinStringLiteral:utf8CodeUnitCount:isASCII:) initializer=**NULL**))))))))

Specifically, the untyped tree features nodes such as unresolved_decl_ref_expr, and many nodes also have a type='<null>' value. These are later filled in with type information as part of the type-checker, which is implemented in libswiftSema. I'll cover libswiftSema in a future article.

The first six stages of the Swift frontend: lexing & parsing

Just how does the swift executable read the text in the hello.swift file to construct an untyped syntax tree? How does it determine that print("Hello, world!") is a call_expr that wraps a paren_expr that wraps a string_literal_expr?

At a high level, the frontend completes parsing in about 6 stages:

  1. The main function in swift/tools/driver/driver.cpp sees that the first argument it's been passed is -frontend, and so it invokes the libswiftFrontendTool library's performFrontend function.
  2. performFrontend parses command-line arguments to determine the FrontendOptions::RequestedAction, by using the CompilerInvocation::parseArgs member function. It then instantiates CompilerInstance and ASTContext objects based on those arguments. Finally, it calls the libswiftFrontendTool function performCompile.
  3. performCompile uses the requested FrontendOptions::RequestedAction to determine whether to call CompilerInstance::performParseOnly or CompilerInstance::performSema. Our invocation of swift -frontend -c results in a requested action of ActionType::EmitObject, and so CompilerInstance::performSema is called.
  4. The CompilerInstance::performSema member function calls through to several other member functions. It opens a bitstream cursor into the standard library module Swift.swiftmodule (it'll use this cursor later, in order to determine the type of the parsed print(...) expression). It adds a SourceFile node to the root of the AST. It also calls the parseIntoSourceFile function, which instantiates a Parser and calls the Parser::parseTopLevel member function. This member function begins the process of lexing and parsing the text in the hello.swift source file.
  5. The Parser works in tandem with a Lexer, which it creates and stores internally as part of the Parser initializer. When the Parser::parseTopLevel member is called, that kicks off an endless loop that:
    1. Lexes a "token". A token is a series of characters in the source file that form a cohesive unit. For example, the first token lexed in hello.swift is "print". That's because the lexer first sees a 'p', determines that it must be the start of an identifier (i.e.: an alphabetical character, followed by any number of alphanumeric characters or underscores), and so continues to read in characters until it sees a '(', at which point it stops.
    2. The Parser looks at the token that the Lexer has lexed, and determines what to "parse." For example, when the Parser sees that the Lexer has lexed an identifier token of print, it asks the Lexer to lex the next token. Because that next token is a '(', it determines that it must be parsing a call expression. It asks the Lexer to lex the appropriate number of tokens in order to close the print(...) expression. Once it has, the Parser instantiates a new AST node: CallExpr.
  6. The Parser and Lexer continue their endless loop, instantiating new AST nodes, such as CallExpr or StringLiteralExpr, and adding them to the ASTContext. Eventually they reach the end of the source file. Normally at this point libswiftFrontend would continue by kicking off the type-checker for hello.swift. However, that's beyond the scope of this article, so we'll stop here.

The rest of this article steps through the code behind these six steps, explaining them in more detail.

Stage 1: Parsing the -frontend argument

As explained in my libswiftDriver article, swift is just a C++ executable. Any invocation of a C++ executable begins in its main function. The swift executable's main function is defined in swift/tools/driver/driver.cpp.

Recall that one of the first things the Swift compiler's main function does is check for the first argument it's given. If that argument is -frontend, it calls the performFrontend function:

swift/tools/driver/driver.cpp

111  int main(int argc_, const char **argv_) {
...
158    StringRef FirstArg(argv[1]);
159    if (FirstArg == "-frontend") {
160      return performFrontend(llvm::makeArrayRef(argv.data()+2,
161                                                argv.data()+argv.size()),
162                             argv[0], (void *)(intptr_t)getExecutablePath);
163    }

Note that -frontend must be the first argument; invoking swift -c hello.swift -frontend is not considered a frontend invocation, and the performFrontend function will not be called. The machinery described in my previous article, Option Parsing in the Swift Compiler, hasn't been initialized yet, and so libswiftOption and libLLVMOption are not used here to check for swift::options::ID::OPT_frontend. Instead, this is a naive string comparison.

Stage 2: Instantiating a CompilerInstance based on command-line arguments

The performFrontend function is part of the libswiftFrontendTool library. libswiftFrontendTool is a tiny Swift compiler library whose single purpose is to drive the frontend compilation process.

The build code for libswiftFrontendTool – which determines the files that are included in the library, how it is built, and which libraries it depends on – is all defined in CMake. Check out the swift/lib/FrontendTool/CMakeLists.txt file to learn more.

There's nothing particularly exciting about the libswiftFrontendTool CMake, so I won't cover it here. A previous article, Reading and Understanding the CMake in apple/swift, has tips on how to read the CMake yourself.

The performFrontend function instantiates a CompilerInstance and a CompilerInvocation, and then performs argument parsing:

swift/lib/FrontendTool/FrontendTool.cpp

1304  int swift::performFrontend(ArrayRef<const char *> Args,
1305                             const char *Argv0, void *MainAddr,
1306                             FrontendObserver *observer) {
....  
1348    std::unique_ptr<CompilerInstance> Instance =
1349      llvm::make_unique<CompilerInstance>();
....
1371    CompilerInvocation Invocation;
....
1379    // Parse arguments.
1380    if (Invocation.parseArgs(Args, Instance->getDiags(), workingDirectory)) {
1381      return finishDiagProcessing(1);
1382    }
....  
1542  }

As I mentioned in previous articles, I'll elide source code that isn't relevant to understanding the topic at hand: how the Swift compiler parses Swift source code. Besdies what's shown above, the performFrontend function also calls functions like llvm::InitializeAllTargets. If it didn't, the Swift compiler would not be able to generate object files. But this article isn't about generating object files, so I won't include that code in these snippets.

The CompilerInstance class is arguably one of the most important classes in the Swift codebase. It's defined as part of the libswiftFrontend library. It's significant because it holds unique, owning references of several important singletons. For example, it holds a unique pointer to the ASTContext, which is a singleton that stores all AST nodes that are created by the compiler.

swift/include/swift/Frontend/Frontend.h

304  /// A class which manages the state and execution of the compiler.
305  /// This owns the primary compiler singletons, such as the ASTContext,
306  /// as well as various build products such as the SILModule.
307  ///
308  /// Before a CompilerInstance can be used, it must be configured by
309  /// calling \a setup.  If successful, this will create an ASTContext
310  /// and set up the basic compiler invariants.  Calling \a setup multiple
311  /// times on a single CompilerInstance is not permitted.
312  class CompilerInstance {
313    CompilerInvocation Invocation;
314    SourceManager SourceMgr;
315    DiagnosticEngine Diagnostics{SourceMgr};
316    std::unique_ptr<ASTContext> Context;
317    std::unique_ptr<SILModule> TheSILModule;
...
578  };

The performFrontend function also instantiates a CompilerInvocation object. Not to be confused with CompilerInstance, the CompilerInvocation class is practically a glorified bag of options. It's responsible for parsing command-line arguments to swift -frontend, and storing their values in various "options" classes, like FrontendOptions or LangOptions.

swift/include/swift/Frontend/Frontend.h

 53  /// The abstract configuration of the compiler, including:
 54  ///   - options for all stages of translation,
 55  ///   - information about the build environment,
 56  ///   - information about the job being performed, and
 57  ///   - lists of inputs.
 58  ///
 59  /// A CompilerInvocation can be built from a frontend command line
 60  /// using parseArgs.  It can then be used to build a CompilerInstance,
 61  /// which manages the actual compiler execution.
 62  class CompilerInvocation {
 63    LangOptions LangOpts;
 64    FrontendOptions FrontendOpts;
 65    ClangImporterOptions ClangImporterOpts;
 66    SearchPathOptions SearchPathOpts;
 67    DiagnosticOptions DiagnosticOpts;
 68    MigratorOptions MigratorOpts;
 69    SILOptions SILOpts;
 70    IRGenOptions IRGenOpts;
 ..  
 80  public:
 ..
 83    /// Initializes the compiler invocation for the list of arguments.
 ..
 92    /// \returns true if there was an error, false on success.
 93    bool parseArgs(ArrayRef<const char *> Args, DiagnosticEngine &Diags,
 94                   StringRef workingDirectory = {});
 ..
302  };

A CompilerInvocation object, then, is basically just a collection of parameters that are used to initialize the all-important CompilerInstance. The command-line arguments to swift -frontend are translated into the appropriate settings on a CompilerInvocation object, via its CompilerInvocation::parseArgs member function.

2.1: Parsing frontend command-line arguments

If you've read Option Parsing in the Swift Compiler, the body of the CompilerInvocation::parseArgs member function should look very familiar to you. It parses arguments using the exact same libswiftOption and libLLVMOption abstractions as the Swift driver does: createSwiftOptTable and llvm::opt::OptTable::ParseArgs.

swift/lib/Frontend/CompilerInvocation.cpp

 962  bool CompilerInvocation::parseArgs(ArrayRef<const char *> Args,
 963                                     DiagnosticEngine &Diags,
 964                                     StringRef workingDirectory) {
 ...
 973    std::unique_ptr<llvm::opt::OptTable> Table = createSwiftOptTable();
 974    llvm::opt::InputArgList ParsedArgs =
 975        Table->ParseArgs(Args, MissingIndex, MissingCount, FrontendOption);
 ...  
 990    if (ParseFrontendArgs(FrontendOpts, ParsedArgs, Diags)) {
 991      return true;
 992    }
 993  
 994    if (ParseLangArgs(LangOpts, ParsedArgs, Diags, FrontendOpts)) {
 995      return true;
 996    }
 ...
1030    return false;
1031  }
1032  

After converting the command-line strings into an llvm::opt::InputArgList via the llvm::opt::OptTable::ParseArgs member function, the CompilerInvocation::parseArgs member function calls functions like ParseFrontendArgs, ParseLangArgs, and so on, in order to set values on members such as CompilerInvocation::LangOpts.

For example, ParseLangArgs is responsible for translating the swift::options::ID::OPT_swift_version stored on the llvm::opt::InputArgList, and using that to set CompilerInvocation::LangOpts::EffectiveLanguageVersion. If the version passed in is invalid, it emits an error diagnostic:

swift/lib/Frontend/CompilerInvocation.cpp

125  static bool ParseLangArgs(LangOptions &Opts, ArgList &Args,
126                            DiagnosticEngine &Diags,
127                            const FrontendOptions &FrontendOpts) {
...
136    if (auto A = Args.getLastArg(OPT_swift_version)) {
137      auto vers = version::Version::parseVersionString(
138        A->getValue(), SourceLoc(), &Diags);
139      bool isValid = false;
140      if (vers.hasValue()) {
141        if (auto effectiveVers = vers.getValue().getEffectiveLanguageVersion()) {
142          Opts.EffectiveLanguageVersion = effectiveVers.getValue();
143          isValid = true;
144        }
145      }
146      if (!isValid)
147        diagnoseSwiftVersion(vers, A, Args, Diags);
148    }
...  
355  }

To test this out, we can try passing swift -frontend an invalid language version, such as swift -frontend -c hello.swift -swift-version foo. This outputs:

<unknown>:0: error: version component contains non-numeric characters
<unknown>:0: error: invalid value 'foo' in '-swift-version foo'
<unknown>:0: note: valid arguments to '-swift-version' are '3', '4', '5'

If you're curious what other combinations of Swift language versions are valid, you can take a look at the libswiftBasic functions that the ParseLangArgs function uses above: Version::parseVersionString and Version::getEffectiveLanguageVersion.

One of the most important parts of this argument parsing is done in the ParseFrontendArgs function. This calls through to the ArgsToFrontendConverter::determineRequestedAction member function, in order to set the CompilerInvocation object's FrontendOptions::RequestedAction, based on whether the frontend was invoked with -emit-object, -emit-sil, or some other option. This "requested action" will determine what logic the frontend executes:

swift/lib/Frontend/ArgsToFrontendOptionsConverter.cpp

269  FrontendOptions::ActionType
270  ArgsToFrontendOptionsConverter::determineRequestedAction() const {
271    using namespace options;
272    const Arg *A = Args.getLastArg(OPT_modes_Group);
...
283    Option Opt = A->getOption();
284    if (Opt.matches(OPT_emit_object))
285      return FrontendOptions::ActionType::EmitObject;
286    if (Opt.matches(OPT_emit_assembly))
287      return FrontendOptions::ActionType::EmitAssembly;
288    if (Opt.matches(OPT_emit_ir))
289      return FrontendOptions::ActionType::EmitIR;
...
308    if (Opt.matches(OPT_dump_parse))
309      return FrontendOptions::ActionType::DumpParse;
310    if (Opt.matches(OPT_dump_ast))
311      return FrontendOptions::ActionType::DumpAST;
...
330    llvm_unreachable("Unhandled mode option");
331  }

My invocation of swift -frontend -c hello.swift does not appear to include the argument OPT_emit_object. However, a quick peek at Options.td (covered in depth in the article on Option Parsing in the Swift Compiler) reveals that -c is an alias for -emit-object:

swift/include/swift/Option/Options.td

571  def c : Flag<["-"], "c">, Alias<emit_object>,
572    Flags<[FrontendOption, NoInteractiveOption]>, ModeOpt;

2.2: Instantiating the ASTContext via the CompilerInstance::setup member function

After the arguments have been parsed, the performFrontend function continues by finishing the initialization of the CompilerInstance. To do so, it calls the member function CompilerInstance::setup:

swift/lib/FrontendTool/FrontendTool.cpp

1304  int swift::performFrontend(ArrayRef<const char *> Args,
1305                             const char *Argv0, void *MainAddr,
1306                             FrontendObserver *observer) {
....  
1348    std::unique_ptr<CompilerInstance> Instance =
1349      llvm::make_unique<CompilerInstance>();
....
1371    CompilerInvocation Invocation;
....
1379    // Parse arguments.
1380    if (Invocation.parseArgs(Args, Instance->getDiags(), workingDirectory)) {
1381      return finishDiagProcessing(1);
1382    }
....  
1464    if (Instance->setup(Invocation)) {
1465      return finishDiagProcessing(1);
1466    }
....
1542  }

CompilerInstance::setup allocates a new ASTContext. ASTContext is responsible for creating, allocating memory for, and owning the nodes of the syntax tree.

swift/lib/Frontend/Frontend.cpp

 76  bool CompilerInstance::setup(const CompilerInvocation &Invok) {
 77    Invocation = Invok;
 ..  
 93    Context.reset(new ASTContext(Invocation.getLangOptions(),
 94                                 Invocation.getSearchPathOptions(), SourceMgr,
 95                                 Diagnostics));
 96  
 ..
106  }

There's no overstating the importance of the ASTContext class; if you do any work on the Swift compiler, you'll see this class used everwhere.

swift/include/swift/AST/ASTContext.h

178  /// ASTContext - This object creates and owns the AST objects.
179  /// However, this class does more than just maintain context within an AST.
180  /// It is the closest thing to thread-local or compile-local storage in this
181  /// code base. Why? SourceKit uses this code with multiple threads per Unix
182  /// process. Each thread processes a different source file. Each thread has its
183  /// own instance of ASTContext, and that instance persists for the duration of
184  /// the thread, throughout all phases of the compilation. (The name "ASTContext"
185  /// is a bit of a misnomer here.) Why not use thread-local storage? This code
186  /// may use DispatchQueues and pthread-style TLS won't work with code that uses
187  /// DispatchQueues. Summary: if you think you need a global or static variable,
188  /// you probably need to put it here instead.
189  
190  class ASTContext {
...
924  };

I'll write more on ASTContext below. For now, let's return to performFrontend. Having called CompilerInstance::setup, it calls performCompile.

swift/lib/FrontendTool/FrontendTool.cpp

1304  int swift::performFrontend(ArrayRef<const char *> Args,
1305                             const char *Argv0, void *MainAddr,
1306                             FrontendObserver *observer) {
....  
1348    std::unique_ptr<CompilerInstance> Instance =
1349      llvm::make_unique<CompilerInstance>();
....
1371    CompilerInvocation Invocation;
....
1379    // Parse arguments.
1380    if (Invocation.parseArgs(Args, Instance->getDiags(), workingDirectory)) {
1381      return finishDiagProcessing(1);
1382    }
....  
1464    if (Instance->setup(Invocation)) {
1465      return finishDiagProcessing(1);
1466    }
....
1508    int ReturnValue = 0;
1509    bool HadError =
1510      performCompile(*Instance, Invocation, Args, ReturnValue, observer,
1511                     StatsReporter.get());
....
1542  }

Stage 3: Kicking off libswiftParse (and libswiftSema)

The performCompile function (also defined as part of the libswiftFrontendTool library) looks at the FrontendOptions::RequestedAction (set in stage 2.1 above) and calls either CompilerInstance::performParseOnly or CompilerInstance::performSema.

swift/lib/FrontendTool/FrontendTool.cpp

528  static bool performCompile(CompilerInstance &Instance,
529                             CompilerInvocation &Invocation,
530                             ArrayRef<const char *> Args,
531                             int &ReturnValue,
532                             FrontendObserver *observer,
533                             UnifiedStatsReporter *Stats) {
534    FrontendOptions opts = Invocation.getFrontendOptions();
535    FrontendOptions::ActionType Action = opts.RequestedAction;
...  
608    if (Action == FrontendOptions::ActionType::Parse ||
609        Action == FrontendOptions::ActionType::DumpParse ||
610        Action == FrontendOptions::ActionType::EmitSyntax ||
611        Action == FrontendOptions::ActionType::DumpInterfaceHash ||
612        Action == FrontendOptions::ActionType::EmitImportedModules)
613      Instance.performParseOnly();
614    else
615      Instance.performSema();
...
857  }

As explained in stage 2.1 above, when I invoke swift -frontend -c hello.swift, it results in a request for ActionType::EmitObject, so the CompilerInstance::performSema member function is called.

Stage 4: Opening a Swift.swiftmodule bitstream cursor and kicking off the parsing loop

The CompilerInstance::performSema member function loads the Swift standard library and then calls through to CompilerInstance::parseAndCheckTypes:

swift/lib/Frontend/Frontend.cpp

369  void CompilerInstance::performSema() {
...
382      if (!loadStdlib())
383        return;
...  
398    if (MainBufferID != NO_SUCH_BUFFER)
399      addMainFileToModule(implicitImports);
400
401    parseAndCheckTypes(implicitImports);
402  }  

The CompilerInstance::loadStdlib member function opens a cursor into the Swift standard library module file. Swift module files are Swift ASTs, serialized into a binary format called an LLVM bitstream. Later, when libswiftSema type-checks the hello.swift file, it will lookup the print function using the cursor created here.

This article only covers parsing, not type-checking. I'll write more about the CompilerInstance::loadStdlib member function, as well as about Swift modules in general, in a future article, which I'll publish before writing about the Swift type-checker.

Next, CompilerInstance::performSema calls CompilerInstance::addFileToMainModule, which calls through to CompilerInstance::createSourceFileForMainModule.

swift/lib/Frontend/Frontend.cpp

744  SourceFile *CompilerInstance::createSourceFileForMainModule(
745      SourceFileKind fileKind, SourceFile::ImplicitModuleImportKind importKind,
746      Optional<unsigned> bufferID) {
...
749    SourceFile *inputFile = new (*Context)
750        SourceFile(*mainModule, fileKind, bufferID, importKind, keepSyntaxInfo);
751    MainModule->addFile(*inputFile);
...  
757    return inputFile;
758  }

This creates the SourceFile node at the root of the AST that is printed out when invoking swiftc -dump-parse hello.swift. The SourceFile class derives from the DeclContext class, which is used in the AST code as a container for arbitrary declarations.

Back in CompilerInstance::performSema, a call to CompilerInstance::parseAndCheckTypes is made, which then calls through to CompilerInstance::parseAndTypeCheckMainFile. This calls the parseIntoSourceFile function multiple times – as many times as that function continues to find Swift code at the top level of the file to parse, until it hits the end of the file.

swift/lib/Frontend/Frontend.cpp

659  void CompilerInstance::parseAndTypeCheckMainFile(
660      PersistentParserState &PersistentState,
661      DelayedParsingCallbacks *DelayedParseCB,
662      OptionSet<TypeCheckingFlags> TypeCheckOptions) {
...
677    bool Done;
678    do {
...
683      parseIntoSourceFile(MainFile, MainFile.getBufferID().getValue(), &Done,
684                          TheSILModule ? &SILContext : nullptr, &PersistentState,
685                          DelayedParseCB);
...
695    } while (!Done);
...
713  }

The parseIntoSourceFile function is defined as part of a tiny compiler library named libswiftParseSIL (the function was recently split out into its own library in order to break circular library dependencies in the compiler). It instantiates a Parser and calls Parser::parseTopLevel. Note that it sets the Done pointer based on whether it's found the eof "token":

swift/lib/ParseSIL/ParseSIL.cpp

101  bool swift::parseIntoSourceFile(SourceFile &SF,
102                                  unsigned BufferID,
103                                  bool *Done,
104                                  SILParserState *SIL,
105                                  PersistentParserState *PersistentState,
106                                  DelayedParsingCallbacks *DelayedParseCB) {
...
108    Parser P(BufferID, SF, SIL ? SIL->Impl.get() : nullptr, PersistentState);
...  
116    bool FoundSideEffects = P.parseTopLevel();
117    *Done = P.Tok.is(tok::eof);
118  
119    return FoundSideEffects;
120  }

4.1: More on the Lexer and tokens

The Parser initializer that's used in the parseIntoSourceFile function creates a new Lexer object:

swift/lib/Parse/Parser.cpp

329  Parser::Parser(unsigned BufferID, SourceFile &SF, SILParserTUStateBase *SIL,
330                 PersistentParserState *PersistentState)
331      : Parser(
332            std::unique_ptr<Lexer>(new Lexer(
333                SF.getASTContext().LangOpts, SF.getASTContext().SourceMgr,
334                BufferID, &SF.getASTContext().Diags,
335                /*InSILMode=*/SIL != nullptr,
336                SF.getASTContext().LangOpts.AttachCommentsToDecls
337                    ? CommentRetentionMode::AttachToNextToken
338                    : CommentRetentionMode::None,
339                SF.shouldKeepSyntaxInfo()
340                    ? TriviaRetentionMode::WithTrivia
341                    : TriviaRetentionMode::WithoutTrivia)),
342            SF, SIL, PersistentState) {}

A Lexer is responsible for reading in the individual characters in a source file and forming logical chunks, called tokens.

The Clang compiler, which compiles C, C++, and Objective-C source code, can print the tokens it lexes, using the -dump-tokens frontend option. For example, consider the following simple C program hello.c:

int main() {
  return 0;
}

I can invoke the Clang frontend, clang -cc1, to dump the tokens in this file (Clang has a driver and frontend system that is nearly identical to Swift's, except that Clang takes the argument -cc1 instead of -frontend). clang -cc1 -dump-tokens hello.c outputs the following:

int 'int'             Loc=<hello.c:1:1>  [StartOfLine]
identifier 'main'     Loc=<hello.c:1:5>  [LeadingSpace]
l_paren '('           Loc=<hello.c:1:9>
r_paren ')'           Loc=<hello.c:1:10>
l_brace '{'           Loc=<hello.c:1:12> [LeadingSpace]
return 'return'       Loc=<hello.c:2:3>  [StartOfLine] [LeadingSpace]
numeric_constant '0'  Loc=<hello.c:2:10> [LeadingSpace]
semi ';'              Loc=<hello.c:2:11>
r_brace '}'           Loc=<hello.c:3:1>  [StartOfLine]
eof ''                Loc=<hello.c:3:2>

The Swift compiler executable does not have an option to print the tokens in a file (although if any readers are interested in contributing, this would be a great feature to add!), but if it did, the tokens in hello.swift would be output like this:

identifier 'print'                Loc=<hello.swift:3:1>  [StartOfLine]
l_paren '('                       Loc=<hello.swift:3:6>
string_literal '"Hello, world!"'  Loc=<hello.swift:3:7>
r_paren ')'                       Loc=<hello.swift:3:22>
eof ''                            Loc=<hello.swift:3:23>

Note that the comment at the top of the file, // hello.swift, and the empty line below that comment, are not represented as tokens. The Swift compiler can be invoked such that it creates tokens for comments and whitespace, but normally they are discarded entirely by the compiler.

The first column of the output above displays the token "kind". Kinds of Swift tokens include identifier, l_paren, and eof. You can find a list of all the different Swift token kinds in swift/include/swift/Syntax/TokenKinds.def. An enum of all the different token kinds is defined in swift/include/swift/Syntax/TokenKinds.h, using a trick readers of my option parsing article should be familiar with: it defines the TOKEN macro, and then includes the TokenKinds.def file, which contais a call to TOKEN for each token kind.

swift/include/swift/Syntax/TokenKinds.h

21  enum class tok {
22  #define TOKEN(X) X,
23  #include "swift/Syntax/TokenKinds.def"
24  
25    NUM_TOKENS
26  };

This creates an enum case for, for example, the if keyword. In that case, the enum case is named tok::kw_if:

swift/include/swift/Syntax/TokenKinds.def

 39  /// KEYWORD(kw)
 40  ///   Expands by default for every Swift keyword and every SIL keyword, such as
 41  ///   'if', 'else', 'sil_global', etc. If you only want to use Swift keywords
 42  ///   see SWIFT_KEYWORD.
 43  #ifndef KEYWORD
 44  #define KEYWORD(kw) TOKEN(kw_ ## kw)
 45  #endif
 46  
 47  /// SWIFT_KEYWORD(kw)
 48  ///   Expands for every Swift keyword.
 49  #ifndef SWIFT_KEYWORD
 50  #define SWIFT_KEYWORD(kw) KEYWORD(kw)
 51  #endif
 ..  
 59  /// STMT_KEYWORD(kw)
 60  ///   Expands for every Swift keyword used in statement grammar.
 61  #ifndef STMT_KEYWORD
 62  #define STMT_KEYWORD(kw) SWIFT_KEYWORD(kw)
 63  #endif
...  
169  STMT_KEYWORD(if)

The Token class stores information about a token: its kind and its text. It also defines member functions such as Token::is, so that other parts of the compiler can quickly check "is this token an if keyword?", by invoking Tok.is(tok::kw_if). Or, to check simply that the token is a keyword, the Token::isKeyword member function is implemented using a macro and an include:

swift/include/swift/Parse/Token.h

 33  class Token {
 34    /// Kind - The actual flavor of token this is.
 35    ///
 36    tok Kind;
 51  
 52    /// Text - The actual string covered by the token in the source buffer.
 53    StringRef Text;
 ..  
 61  public:
 ..    
 76    /// is/isNot - Predicates to check if this token is a specific kind, as in
 77    /// "if (Tok.is(tok::l_brace)) {...}".
 78    bool is(tok K) const { return Kind == K; }
...  
214    /// True if the token is any keyword.
215    bool isKeyword() const {
216      switch (Kind) {
217  #define KEYWORD(X) case tok::kw_##X: return true;
218  #include "swift/Syntax/TokenKinds.def"
219      default: return false;
220      }
221    }
...
302  };

4.2: Priming the Lexer to form a token for the identifier "print"

Normally the Parser prompts the Lexer to lex the next token in a file. The Lexer also, when it's initialized, lexes the first token in the file by calling the Lexer::primeLexer member function.

swift/include/swift/Parse/Lexer.h

 64  class Lexer {
...
169    Lexer(const LangOptions &Options,
170          const SourceManager &SourceMgr, unsigned BufferID,
171          DiagnosticEngine *Diags, bool InSILMode,
172          CommentRetentionMode RetainComments = CommentRetentionMode::None,
173          TriviaRetentionMode TriviaRetention = TriviaRetentionMode::WithoutTrivia)
174        : Lexer(Options, SourceMgr, Diags, BufferID, InSILMode, RetainComments,
175                TriviaRetention) {
176      primeLexer();
177    }
...
523  };

Lexer::primeLexer calls through to Lexer::lexImpl. This member function implements the core of the lexing functionality in the Swift compiler; it is the method that, based on the first character in a series, determines whether to lex an identifier, a number literal, an operator, or some other token.

The Lexer keeps a pointer to the character it's currently lexing using a member Lexer::CurPtr. Lexer::lexImpl moves the pointer to the next character and then enters a switch statement based on the character's value. Here are the switch statement cases that are relevant to my source file hello.swift:

swift/lib/Parse/Lexer.cpp

2041  void Lexer::lexImpl() {
....
2068  Restart:
....
2074    switch ((signed char)*CurPtr++) {
....
2150    case ' ':
2151    case '\t':
2152    case '\f':
2153    case '\v':
2154      goto Restart;  // Skip whitespace.
....  
2193    case '(': return formToken(tok::l_paren, TokStart);
2194    case '}': return formToken(tok::r_brace, TokStart);
2195    case ']': return formToken(tok::r_square, TokStart);
2196    case ')':
2197      return formToken(tok::r_paren, TokStart);
....  
2209    case '/':
2210      if (CurPtr[0] == '/') {  // "//"
2211        skipSlashSlashComment(/*EatNewline=*/true);
....
2215        goto Restart;
2216      }
....
2264    case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
2265    case 'H': case 'I': case 'J': case 'K': case 'L': case 'M': case 'N':
2266    case 'O': case 'P': case 'Q': case 'R': case 'S': case 'T': case 'U':
2267    case 'V': case 'W': case 'X': case 'Y': case 'Z':
2268    case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g':
2269    case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n':
2270    case 'o': case 'p': case 'q': case 'r': case 's': case 't': case 'u':
2271    case 'v': case 'w': case 'x': case 'y': case 'z':
2272    case '_':
2273      return lexIdentifier();
....
2282    case '"':
2283    case '\'':
2284      return lexStringLiteral();
....
2289  }

When the Lexer is first initialized in the frontend when parsing hello.swift, the Lexer::primeLexer member function is called, which calls Lexer::lexImpl. This skips the // comments and whitespace at the beginning of hello.swift, until it reaches the 'p' character in "print", and so it enters the case in which the Lexer::lexIdentifier member function is called. This function consumes characters as long as they are valid identifier characters, and then creates a Token with a token kind of either tok::identifier, or a keyword token kind like tok::kw_if.

swift/lib/Parse/Lexer.cpp

591  /// lexIdentifier - Match [a-zA-Z_][a-zA-Z_$0-9]*
592  void Lexer::lexIdentifier() {
593    const char *TokStart = CurPtr-1;
594    CurPtr = TokStart;
595    bool didStart = advanceIfValidStartOfIdentifier(CurPtr, BufferEnd);
...  
599    // Lex [a-zA-Z_$0-9[[:XID_Continue:]]]*
600    while (advanceIfValidContinuationOfIdentifier(CurPtr, BufferEnd));
601  
602    tok Kind = kindOfIdentifier(StringRef(TokStart, CurPtr-TokStart), InSILMode);
603    return formToken(Kind, TokStart);
604  }

In the case of hello.swift, Lexer::lexIdentifier would advance, from 'p', and end at "print", stopping when it saw the '(' character immediately after "print". It would then call Lexer::kindOfIdentifier, to determine whether "print" was a keyword. The Lexer::kindOfIdentifier member function uses, sure enough, the macro and include trick:

swift/lib/Parse/Lexer.cpp

570  tok Lexer::kindOfIdentifier(StringRef Str, bool InSILMode) {
571    tok Kind = llvm::StringSwitch<tok>(Str)
572  #define KEYWORD(kw) \
573      .Case(#kw, tok::kw_##kw)
574  #include "swift/Syntax/TokenKinds.def"
575      .Default(tok::identifier);
...
588    return Kind;
589  }

So, as soon as the Lexer is initialized, the Lexer::primeLexer member function results in the Lexer::NextToken member being set to a token with kind tok::identifier, and with the text "print".

The next time the Parser requests a token from its Lexer, the Lexer will immediately return to it this "print" token and, if it is not at the end of the file, it'll prepare the next Lexer::NextToken in the same way, by calling Lexer::lexImpl again.

Stage 5: The lex & parse loop

To recap: the libswiftFrontendTool function performFrontend parsed the arguments passed into our swift -frontend invocation and determined it would call the CompilerInstance::performSema member function. That eventually called the parseIntoSourceFile function, which instantiated a Parser and its internal Lexer (which resulted in Lexer being "primed" with the "print" token). Finally, parseIntoSourceFile calls Parser::parseTopLevel, which kicks off the lexing and parsing loop that results in the full syntax tree being constructed for our file.

swift/lib/Parse/ParseDecl.cpp

189  bool Parser::parseTopLevel() {
...
194      consumeTokenWithoutFeedingReceiver();
...  
234      parseBraceItems(Items,
235                      allowTopLevelCode() ? BraceItemListKind::TopLevelCode
236                                          : BraceItemListKind::TopLevelLibrary);
...
281  }

The Parser::parseTopLevel member function first requests the next token from the Lexer, by calling consumeTokenWithoutFeedingReceiver. This sets member Parser::Tok to the token for "print", and has the Lexer ready the next token, which is an tok::l_paren '('.

It then calls Parser::parseBraceItems. "Brace items" here refers to a sequence of expressions or statements that are contained with a pair of braces { ... }. Our hello.swift file does not contain braces, but when parsing top-level code in a file the Swift parser pretends that there are a set of braces wrapping that top level code.

Parser::parseBraceItems enters a while loop that continues to parse the file until it reaches the file's end. Within that loop, it creates AST nodes for expressions or statements that it finds, by calling Parser::parseExprOrStmt. If it finds an expression or statement, it wraps it in a BraceStmt node:

swift/lib/Parse/ParseStmt.cpp

226  ParserStatus Parser::parseBraceItems(SmallVectorImpl<ASTNode> &Entries,
227                                       BraceItemListKind Kind,
228                                       BraceItemListKind ConditionalBlockKind) {
...
250    while ((IsTopLevel || Tok.isNot(tok::r_brace)) &&
...
254           Tok.isNot(tok::eof) &&
...
263            !isTerminatorForBraceItemListKind(Kind, Entries))) {
...
345        // If this is a statement or expression at the top level of the module,
346        // Parse it as a child of a TopLevelCodeDecl.
347        auto *TLCD = new (Context) TopLevelCodeDecl(CurDeclContext);
...
357        ParserStatus Status = parseExprOrStmt(Result);
...
373        if (!Result.isNull()) {
374          // NOTE: this is a 'virtual' brace statement which does not have
375          //       explicit '{' or '}', so the start and end locations should be
376          //       the same as those of the result node
377          auto Brace = BraceStmt::create(Context, Result.getStartLoc(),
378                                         Result, Result.getEndLoc());
379          TLCD->setBody(Brace);
...
381        }
...
424    }
...
427  }

With the creation of a TopLevelCodeDecl and a BraceStmt to wrap the result of the Parser::parseExprOrStmt member function call, the AST for hello.swift would look like this:

(source_file
  (top_level_code_decl
    (brace_stmt
      ...)))

The actual expression print("Hello, world!") will be parsed as part of the call to Parser::parseExprOrStmt.

5.1: A quick note on custom allocators in C++

Before getting into the Parser::parseExprOrStmt member function, take a closer look at the instantiation of TopLevelCodeDecl, which uses an interesting C++ feature:

swift/lib/Parse/ParseStmt.cpp

226  ParserStatus Parser::parseBraceItems(SmallVectorImpl<ASTNode> &Entries,
227                                       BraceItemListKind Kind,
228                                       BraceItemListKind ConditionalBlockKind) {
...
347        auto *TLCD = new (Context) TopLevelCodeDecl(CurDeclContext);
...
427  }

I didn't have any experience writing C++ before I began working on Swift and Clang, so this call to new (Context) TopLevelCodeDecl(...) confused me. I was familiar with expressions such as new Foo(), which allocate memory for an instance of Foo. But the expression new (Context) TopLevelCodeDecl(...) seemed to have an extra element: what is (Context) here?

It turns out that C++ allows you to provide overrides of the new operator for specific classes, and those overrides can take additional parameters. The new operator's first argument must be a size_t that indicates how many bytes should be allocated, but beyond that you can define an arbitrary list of parameters. Here, Context is an argument being passed to new.

Swift's Decl class not only defines a custom new operator that takes an ASTContext argument, it also deletes the default new operator implementation:

swift/include/swift/AST/Decl.h

235  /// Decl - Base class for all declarations in Swift.
236  class alignas(1 << DeclAlignInBits) Decl {
237  protected:
865  
866    // Make vanilla new/delete illegal for Decls.
867    void *operator new(size_t Bytes) = delete;
...  
870    // Only allow allocation of Decls using the allocator in ASTContext
871    // or by doing a placement new.
872    void *operator new(size_t Bytes, const ASTContext &C,
873                       unsigned Alignment = alignof(Decl));
...
878  };

This means that you cannot allocate memory for a Decl by calling new Decl() – you must call new (Context) Decl(). Doing so calls the ASTContext::Allocate member function:

swift/lib/AST/Decl.cpp

97  // Only allow allocation of Decls using the allocator in ASTContext.
98  void *Decl::operator new(size_t Bytes, const ASTContext &C,
99                           unsigned Alignment) {
100    return C.Allocate(Bytes, Alignment);
101  }

You may also have noticed this syntax earlier in this article, when CompilerInstance::performSema called through to a function that created the SourceFile root node in the AST. The SourceFile class inherits from DeclContext, which declares its own overload of the new operator:

swift/include/swift/AST/DeclContext.h

186  class alignas(1 << DeclContextAlignInBits) DeclContext {
...
554    // Only allow allocation of DeclContext using the allocator in ASTContext.
555    void *operator new(size_t Bytes, ASTContext &C,
556                       unsigned Alignment = alignof(DeclContext));
...
560  };

This also calls through to the ASTContext::Allocate function:

swift/lib/AST/DeclContext.cpp

38  // Only allow allocation of DeclContext using the allocator in ASTContext.
39  void *DeclContext::operator new(size_t Bytes, ASTContext &C,
40                                  unsigned Alignment) {
41    return C.Allocate(Bytes, Alignment);
42  }

The ASTContext::Allocate member function is too interesting to explain in detail here. I'll write about it in a future article.

5.2: Parsing the print(...) expression

To parse the expression or statement at the top level of my hello.swift file, the Parser must first determine whether it's an expression or a statement. It makes use of a helper function Parser::isStartOfStmt to do so. For the most part, this just checks for keywords that clearly demarcate a statement:

swift/lib/Parse/ParseStmt.cpp

37  bool Parser::isStartOfStmt() {
38    switch (Tok.getKind()) {
39    default: return false;
40    case tok::kw_return:
41    case tok::kw_throw:
42    case tok::kw_defer:
43    case tok::kw_if:
44    case tok::kw_guard:
45    case tok::kw_while:
46    case tok::kw_do:
47    case tok::kw_repeat:
48    case tok::kw_for:
49    case tok::kw_break:
50    case tok::kw_continue:
51    case tok::kw_fallthrough:
52    case tok::kw_switch:
53    case tok::kw_case:
54    case tok::kw_default:
55    case tok::pound_if:
56    case tok::pound_sourceLocation:
57      return true;
..
85    }
86  }

Using this function, Parser::parseExprOrStmt determines that print(...) is an expression, and so it calls Parser::parseExpr:

swift/lib/Parse/ParseStmt.cpp

 88  ParserStatus Parser::parseExprOrStmt(ASTNode &Result) {
 ..    
 96    if (isStartOfStmt()) {
 97      ParserResult<Stmt> Res = parseStmt();
 98      if (Res.isNonNull())
 99        Result = Res.get();
100      return Res;
101    }
...  
117    ParserResult<Expr> ResultExpr = parseExpr(diag::expected_expr);
118    if (ResultExpr.isNonNull()) {
119      Result = ResultExpr.get();
120    } else if (!ResultExpr.hasCodeCompletion()) {
...
127    }
...  
133    return ResultExpr;
134  }
135  

Parsing an expression in Swift is complicated, because expressions can be composed of sequence of expressions of arbitrary length. For example, a && (b || (c && d)) is an expression itself, but (c && d) and (b || (c && d) are also expressions on their own.

I'll skip over some of the complexity that this introduces in order to showcase just the path that the code takes when parsing our single element expression sequence print("Hello, world!"). Suffice it to say that Parser::parseExpr eventually calls Parser::parseExprPostfixWithoutSuffix, a function that enters a large switch statement on the current token kind: for example, if the current token is a string literal, it calls parseExprStringLiteral.

swift/lib/Parse/ParseExpr.cpp

1463  ParserResult<Expr>
1464  Parser::parseExprPostfixWithoutSuffix(Diag<> ID, bool isExprBasic) {
....
1466    ParserResult<Expr> Result;
1467    switch (Tok.getKind()) {
....
1498    case tok::string_literal:  // "foo"
1499      Result = parseExprStringLiteral();
1500      break;
....        
1562    case tok::identifier:  // foo
....  
1598      LLVM_FALLTHROUGH;
1599    case tok::kw_self:     // self
1600    case tok::kw_Self:     // Self
1601      Result = makeParserResult(parseExprIdentifier());
1602  
1603      // If there is an expr-call-suffix, parse it and form a call.
1604      if (Tok.isFollowingLParen()) {
1605        Result = parseExprCallSuffix(Result, isExprBasic);
....
1607        break;
1608      }
....
1610      break;
.... 
1883    }
....  
1885    return Result;
1886  }

In the case of print(...), the Parser::parseExprPostfixWithoutSuffix function enters the tok::identifier case. In this case, it falls through to the same case as for self or Self, and so it parses the identifier with Parser::parseExprIdentifier. Then, if the next token is a left parentheses '(', it parses a call expression by calling Parser::parseExprCallSuffix.

The Parser::parseExprIdentifier member function attempts to lookup the identifer name in the current scope and, if it can't find a definition, it creates an UnqualifiedDeclRefExpr node in the AST:

swift/lib/Parse/ParseExpr.cpp

2218  Expr *Parser::parseExprIdentifier() {
....
2249    ValueDecl *D = nullptr;
....
2251      D = lookupInScope(name);
....    
2278    Expr *E;
2279    if (D == nullptr) {
....
2286      E = new (Context) UnresolvedDeclRefExpr(name, refKind, loc);
....    }
2307    return E;
2308  }

An UnqualifiedDeclRefExpr is an expression that references an "unqualified" identifier. The identifier might not have been defined yet, or it may never be defined and so this code will fail during type-checking, or the identifier could have been defined in a different module.

In this case, print is defined in a different module: it's defined in Swift.swiftmodule, the Swift standard library module. As a result, any references to print will be treated as an UnqualifiedDeclRefExpr, at least until the type-checker runs and the print identifier is read from the Swift module.

As explained above Parser::parseExprPostfixWithoutSuffix parses the print identifier, then checks for an opening parenthesis '('. If one exists, it calls Parser::parseExprCallSuffix.

Just as with parsing expression sequences, parsing arguments to a call expression is complicated. Swift functions can take an arbitrary number of expressions as arguments, and each of those arguments can itself be a sequence of expressions. I won't go into the details here; the basic idea is that Parser::parseExprList parses a list of expressions that are passed into a call to CallExpr::create:

swift/lib/Parse/ParseExpr.cpp

3223  ParserResult<Expr>
3224  Parser::parseExprCallSuffix(ParserResult<Expr> fn, bool isExprBasic) {  
3227
3256    ParserStatus status = parseExprList(tok::l_paren, tok::r_paren,
3257                                        /*isPostfix=*/true, isExprBasic,
3258                                        lParenLoc, args, argLabels,
3259                                        argLabelLocs,
3260                                        rParenLoc,
3261                                        trailingClosure,
3262                                        SyntaxKind::FunctionCallArgumentList);
3263  
3265    auto Result = makeParserResult(status | fn, 
3266                                   CallExpr::create(Context, fn.get(), lParenLoc,
3267                                                    args, argLabels, argLabelLocs,
3268                                                    rParenLoc, trailingClosure,
3269                                                    /*implicit=*/false));
3278    return Result;
3279  }

In the case of hello.swift, Parser::parseExprList only finds a single expression. It eventually calls through to the string_literal case in the Parser::parseExprPostfixWithoutSuffix function posted above, which results in a call to Parser::parseExprStringLiteral and, in turn, the createStringLiteralExprFromSegment function:

swift/lib/Parse/ParseExpr.cpp

1888  static StringLiteralExpr *
1889  createStringLiteralExprFromSegment(ASTContext &Ctx,
1890                                     const Lexer *L,
1891                                     Lexer::StringSegment &Segment,
1892                                     SourceLoc TokenLoc) {
....
1902    return new (Ctx) StringLiteralExpr(EncodedStr, TokenLoc);
1903  }

The Parser::parseExprList function allocates a ParenExpr. Whatever expresions it parses, such as the StringLiteralExpr above, are nested within the ParenExpr.

All told, the call to Parser::parseExprOrStmt results in the following nodes being added to the AST:

(call_expr type='<null>' arg_labels=_:
  (unresolved_decl_ref_expr type='<null>' name=print function_ref=unapplied)
  (paren_expr type='<null>'
  (string_literal_expr type='<null>' encoding=utf8 value="Hello, world!" builtin_initializer=**NULL** initializer=**NULL**)))

As shown above, the CallExpr result is then nested within the BraceStmt that is created within the Parser::parseBraceItems member function. So, in sum, the AST looks like this:

(source_file
  (top_level_code_decl
    (brace_stmt
      (call_expr type='<null>' arg_labels=_:
        (unresolved_decl_ref_expr type='<null>' name=print function_ref=unapplied)
        (paren_expr type='<null>'
          (string_literal_expr type='<null>' encoding=utf8 value="Hello, world!" builtin_initializer=**NULL** initializer=**NULL**))))))

Stage 6: Reaching the end of the file

When Parser::parseExprList is called, the Parser tasks the Lexer with consuming each token within the parentheses that come after print. The parsing and lexing stops at ')', at which point the Lexer::NextToken is set to tok::eof – the end of the file.

Recall that all this parsing was occuring within a while loop in Parser::parseBraceItems. One of the termination conditions for that while loop was encountering an EOF marker. Now that the end of the file has been reached, Parser::parseBraceItems returns control back to Parser::parseTopLevel, which returns control back to parseIntoSourceFile, which returns back to CompilerInstance::parseAndTypeCheckMainFile. The stage is now set for type-checking, which is done by calling the performTypeChecking function:

swift/lib/Frontend/Frontend.cpp

659  void CompilerInstance::parseAndTypeCheckMainFile(
660      PersistentParserState &PersistentState,
661      DelayedParsingCallbacks *DelayedParseCB,
662      OptionSet<TypeCheckingFlags> TypeCheckOptions) {
...
677    bool Done;
678    do {
...
683      parseIntoSourceFile(MainFile, MainFile.getBufferID().getValue(), &Done,
684                          TheSILModule ? &SILContext : nullptr, &PersistentState,
685                          DelayedParseCB);
...
688        performTypeChecking(MainFile, PersistentState.getTopLevelContext(),
689                            TypeCheckOptions, CurTUElem,
690                            options.WarnLongFunctionBodies,
691                            options.WarnLongExpressionTypeChecking,
692                            options.SolverExpressionTimeThreshold);
...
695    } while (!Done);
...
713  }

I'll cover type-checking in a future article.

Recap: lexing and parsing in the Swift compiler

This article covered a lot of ground:

  1. It explained that Swift's main function determines whether to call performFrontend by checking for the presence of the -frontend argument.
  2. If the -frontend argument is provided, performFrontend parses the other arguments passed into swift -frontend. It determines the requested action based on those arguments, and instantiates a CompilerInstance and ASTContext to carry out that action. If compilation is requested, it calls performCompile.
  3. performCompile checks the requested action and decides whether to call CompilerInstance::performSema (parsing and type-checking), or CompilerInstance::performParseOnly (just parsing). Our invocation of swift -frontend -c results in a requested action of ActionType::EmitObject, and so CompilerInstance::performSema is called.
  4. A Parser is initialized along with its Lexer. A Lexer reads characters and transforms strings of them into Token objects. A Parser drives the Lexer, based on the tokens its seeing and its knowledge of Swift syntax. The parsing process begins when Parser::parseTopLevel is called. UnresolvedDeclRefExpr. It sees an opening parenthesis immediately following the identifier, so it wraps the expression in a CallExpr. The arguments to the call expression are parsed: a single StringLiteralExpr, wrapped in a ParenExpr.
  5. The end of the file is reached, and so control goes back to CompilerInstance::performSema. Next stop: the type-checker!

To add new pieces of Swift syntax, or to modify existing Swift syntax, it's helpful to understand the libswiftParse source code. For example:

To learn more about how parsing works in the Swift compiler, try writing small programs and walking through the code in the compiler to see how they're parsed. A good way to do this is by attaching a debugger – read the instructions from the first article in this series, Getting Started with Swift Compiler Development, to learn how to do so. There's still a lot left to learn that I didn't cover in this article. Here's just two examples of parser mechanics that this article didn't cover: