How the Swift Compiler Emits Diagnostics, Part 2: Swift's Wrappers of LLVM Abstractions

In the last two articles in this series on Swift compiler development, I wrote about the abstractions provided by LLVM to emit diagnostics. How the Swift Compiler Emits Diagnostics, Part 1 explained how the llvm::SourceMgr class prints diagnostic messages along with lines of source code in an llvm::MemoryBuffer. Then I wrote about the internals of that llvm::MemoryBuffer class in How Swift and Clang Use LLVM to Read Files into Memory.

In this article, I explain how these abstractions come together in the Swift compiler. Specifically, I'll cover:

  1. How the Swift frontend reads source files in as llvm::MemoryBuffer instances.
  2. How the Swift frontend registers those buffers with a swift::SourceManager, a libswiftBasic wrapper around llvm::SourceMgr.
  3. How the Swift lexer records swift::SourceLoc locations as attributes on the tokens it lexes. (swift::SourceLoc is defined in libswiftBasic, as a wrapper around llvm::SMLoc.)
  4. How the Swift parser instructs swift::DiagnosticEngine and swift::SourceManager to print a diagnostic in the event it has encountered a parse error.

It's these mechanisms above that result in the printing of the parser error I wrote about in the last article:

uhoh.swift

1  // uhoh.swift
2  
3  print(("Yikes.")
swiftc uhoh.swift
uhoh.swift:3:17: error: expected ')' in expression list
print(("Yikes.")
                ^
uhoh.swift:3:6: note: to match this opening '('
print(("Yikes.")
     ^

Finally, this article will write about how the Swift compiler defines its diagnostics strings – for example, the "expected ')' in expression list" string above – by using macros. This is similar to the way token kinds are defined, as described in my article Getting Started with the Swift Frontend: Lexing & Parsing.

Step 1: Creating an llvm::MemoryBuffer for each input source file

My article An Introduction to the Swift Compiler Driver explained how an invocation of swift is normally split up, by libswiftDriver, into a series of swift -frontend and ld linker invocations. And in Getting Started with the Swift Frontend: Lexing & Parsing, I explained how an invocation of swift -frontend results in libswiftFrontendTool and libswiftFrontend instantiating a swift::CompilerInvocation, parsing command-line arguments via the swift::CompilerInvocation::parseArgs member function, and then using those parsed arguments to instantiate a swift::CompilerInstance. To recap, here's that code again:

swift/lib/FrontendTool/FrontendTool.cpp

1304  int swift::performFrontend(ArrayRef<const char *> Args,
1305                             const char *Argv0, void *MainAddr,
1306                             FrontendObserver *observer) {
....  
1348    std::unique_ptr<CompilerInstance> Instance =
1349      llvm::make_unique<CompilerInstance>();
....  
1371    CompilerInvocation Invocation;
....  
1379    // Parse arguments.
1380    if (Invocation.parseArgs(Args, Instance->getDiags(), workingDirectory)) {
1381      return finishDiagProcessing(1);
1382    }
....  
1464    if (Instance->setup(Invocation)) {
1465      return finishDiagProcessing(1);
1466    }
....  
1542  }

In Getting Started with the Swift Frontend: Lexing & Parsing, I pointed out that the swift::CompilerInstance::setup function instantiated a swift::ASTContext object (a crucially important object that represents the syntax tree of the Swift program being compiled). But it's also in this function that source files are read into llvm::MemoryBuffer objects. The setup function calls through to swift::CompilerInstance::setUpInputs, which calls setUpForInput, and so on, until eventually getInputBufferAndModuleDocBufferIfPresent instantiates an llvm::MemoryBuffer via the llvm::MemoryBuffer::getFileOrSTDIN function (which I explained in detail in the last article):

swift/lib/Frontend/Frontend.cpp

135  bool CompilerInstance::setup(const CompilerInvocation &Invok) {
136    Invocation = Invok;
...  
164    return setUpInputs();
165  }
...  
238  bool CompilerInstance::setUpInputs() {
...  
243    for (const InputFile &input :
244         Invocation.getFrontendOptions().InputsAndOutputs.getAllInputs())
245      if (setUpForInput(input))
246        return true;
...  
260  }
261  
262  bool CompilerInstance::setUpForInput(const InputFile &input) {
263    bool failed = false;
264    Optional<unsigned> bufferID = getRecordedBufferID(input, failed);
265    if (failed)
266      return true;
...  
280    return false;
281  }
282  
283  Optional<unsigned> CompilerInstance::getRecordedBufferID(const InputFile &input,
284                                                           bool &failed) {
...  
291    std::pair<std::unique_ptr<llvm::MemoryBuffer>,
292              std::unique_ptr<llvm::MemoryBuffer>>
293        buffers = getInputBufferAndModuleDocBufferIfPresent(input);
...  
308    // Transfer ownership of the MemoryBuffer to the SourceMgr.
309 unsigned bufferID = SourceMgr.addNewSourceBuffer(std::move(buffers.first));
... 313 } 314 315 std::pair<std::unique_ptr<llvm::MemoryBuffer>, 316 std::unique_ptr<llvm::MemoryBuffer>> 317 CompilerInstance::getInputBufferAndModuleDocBufferIfPresent( 318 const InputFile &input) { ... 326 using FileOrError = llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>;
327 FileOrError inputFileOrErr = llvm::MemoryBuffer::getFileOrSTDIN(input.file());
328 if (!inputFileOrErr) { 329 Diagnostics.diagnose(SourceLoc(), diag::error_open_input_file, input.file(), 330 inputFileOrErr.getError().message()); 331 return std::make_pair(nullptr, nullptr); 332 } ... 342 }

Notice above that, after calling getInputBufferAndModuleDocBufferIfPresent to instantiate an llvm::MemoryBuffer, the swift::CompilerInstance::getRecordedBufferID function adds the buffer to the source manager.

Step 2: Registering the llvm::MemoryBuffer with the swift::SourceManager

The Swift codebase defines a library named libswiftBasic. It's the Swift codebase's equivalent of libLLVMSupport: a grab-bag of helper classes and functions without much in common.

Included in this helper library are the Swift classes swift::SourceManager and swift::SourceLoc, which wrap llvm::SourceMgr and llvm::SMLoc. They even define a lot of the same member functions, and in their implementations simply forward these along to the internal LLVM class – for example, swift::SourceManager::addNewSourceBuffer simply calls through to llvm::SourceMgr::AddNewSourceBuffer:

swift/include/swift/Basic/SourceManager.h

 22  namespace swift {
 23  
 24  /// \brief This class manages and owns source buffers.
 25  class SourceManager {
 26    llvm::SourceMgr LLVMSourceMgr;
 ..  
103    /// Adds a memory buffer to the SourceManager, taking ownership of it.
104 unsigned addNewSourceBuffer(std::unique_ptr<llvm::MemoryBuffer> Buffer);
... 234 }; 235 236 } // end namespace swift

swift/lib/Basic/SourceLoc.cpp

42  unsigned
43  SourceManager::addNewSourceBuffer(std::unique_ptr<llvm::MemoryBuffer> Buffer) {
..  
46 auto ID = LLVMSourceMgr.AddNewSourceBuffer(std::move(Buffer), llvm::SMLoc());
.. 49 }

libswiftBasic wraps these LLVM abstractions in order to expose only some of their functions to the rest of the Swift codebase, or to add convenience methods. For example swift::SourceLoc is documented as being defined mainly to deter other parts of the Swift codebase from calling llvm::SMLoc::getFromPointer:

swift/include/swift/Basic/SourceLoc.h

28  /// SourceLoc in swift is just an SMLoc.  We define it as a different type
29  /// (instead of as a typedef) just to remove the "getFromPointer" methods and
30  /// enforce purity in the Swift codebase.
31  class SourceLoc {
..  
80  };

So in step one above, after instantiating an llvm::MemoryBuffer in the swift::CompilerInstance::getInputBufferAndModuleDocBufferIfPresent function, the swift::CompilerInstance::getRecordedBufferID function calls swift::SourceManager::addNewSourceBuffer, a trivial wrapper around llvm::SourceMgr::AddNewSourceBuffer.

The swift::SourceManager is stored as a member variable on swift::CompilerInstance, called swift::CompilerInstance::SourceMgr – which is confusing, since that's the same spelling as the llvm::SourceMgr class. But one is an member variable, the other is a class.

swift/lib/Frontend/Frontend.cpp

283  Optional<unsigned> CompilerInstance::getRecordedBufferID(const InputFile &input,
284                                                           bool &failed) {
...  
291    std::pair<std::unique_ptr<llvm::MemoryBuffer>,
292              std::unique_ptr<llvm::MemoryBuffer>>
293        buffers = getInputBufferAndModuleDocBufferIfPresent(input);
...  
308    // Transfer ownership of the MemoryBuffer to the SourceMgr.
309 unsigned bufferID = SourceMgr.addNewSourceBuffer(std::move(buffers.first));
... 313 }

I explained in How the Swift Compiler Emits Diagnostics, Part 1: LLVM Abstractions how llvm::SourceMgr prints diagnostics at locations in an llvm::MemoryBuffer. The Swift lexer is responsible for recording those locations, by instantiating swift::SourceLoc objects (wrappers for llvm::SMLoc) as it lexes tokens.

Step 3: Recording the locations of lexed tokens

In Getting Started with the Swift Frontend: Lexing & Parsing I wrote about how libswiftFrontend eventually calls swift::parseIntoSourceFile, which instantiates a swift::Parser. The swift::Parser initializer instantiates a new swift::Lexer, passing in the swift::SourceManager:

swift/lib/Parse/Parser.cpp

329  Parser::Parser(unsigned BufferID, SourceFile &SF, SILParserTUStateBase *SIL,
330                 PersistentParserState *PersistentState)
331      : Parser(
332            std::unique_ptr<Lexer>(new Lexer(
333 SF.getASTContext().LangOpts, SF.getASTContext().SourceMgr,
334 BufferID, &SF.getASTContext().Diags,
335 /*InSILMode=*/SIL != nullptr, 336 SF.getASTContext().LangOpts.AttachCommentsToDecls 337 ? CommentRetentionMode::AttachToNextToken 338 : CommentRetentionMode::None, 339 SF.shouldKeepSyntaxInfo() 340 ? TriviaRetentionMode::WithTrivia 341 : TriviaRetentionMode::WithoutTrivia)), 342 SF, SIL, PersistentState) {}

Besides the swift::SourceManager, the swift::Lexer also takes a swift::DiagnosticEngine as a parameter to its initializer. This class is used to actually print the diagnostics – I'll cover it more in step 4 of this article.

In that article I also wrote about how the swift::Lexer iterates over each character in its input memory buffer in order to form tokens. For example, if it finds the character 'p' in its buffer, it'll attempt to lex an identifier. To do so, it advances its pointer into the buffer as long as it finds characters that form valid identifiers ('0' through '9', 'a' through 'Z', underscores, etc.). Once it can advance no further, it calls swift::Lexer::formToken in order to reset the current swift::Token with the correct source location.

Here's the code that lexes an identifier and forms a token based on a pointer to its text:

swift/lib/Parse/Lexer.cpp

591  /// lexIdentifier - Match [a-zA-Z_][a-zA-Z_$0-9]*
592  void Lexer::lexIdentifier() {
593    const char *TokStart = CurPtr-1;
594    CurPtr = TokStart;
595    bool didStart = advanceIfValidStartOfIdentifier(CurPtr, BufferEnd);
...  
599    // Lex [a-zA-Z_$0-9[[:XID_Continue:]]]*
600    while (advanceIfValidContinuationOfIdentifier(CurPtr, BufferEnd));
601  
602    tok Kind = kindOfIdentifier(StringRef(TokStart, CurPtr-TokStart), InSILMode);
603 return formToken(Kind, TokStart);
604 }

The swift::Token::getLoc member function can then be used to instantiate an llvm::SMLoc and swift::SourceLoc based on the buffer pointer:

swift/include/swift/Parse/Token.h

 34  class Token {
...  
221    /// getLoc - Return a source location identifier for the specified
222    /// offset in the current file.
223    SourceLoc getLoc() const {
224      return SourceLoc(llvm::SMLoc::getFromPointer(Text.begin()));
225    }
...  
280  };

So that explains how the lexer constructs tokens and their locations, how the frontend instantiates memory buffers for each of the input files, and how those memory buffers are owned and managed by a swift::SourceManager. Finally, the parser calls through to llvm::SourceMgr::PrintMessage in order to actually print the diagnostic – just as my sample program did in How the Swift Compiler Emits Diagnostics, Part 1.

But the Swift compiler uses several layers of indirection to do so: swift::DiagnosticEngine, swift::Diagnostic, swift::InFlightDiagnostic, and more. The next and final step in this article writes about those classes in detail.

Step 4: Printing the diagnostic

When the swift::Parser parses invalid Swift source code and determines it must print a diagnostic, it doesn't call llvm::SourceMgr::PrintMessage directly. As an example, let's take a closer look at the note diagnostic from the beginning of this article:

uhoh.swift:3:6: note: to match this opening '('
print(("Yikes.")
     ^

This diagnostic is printed because the parser encounters the opening '(' token, and its Parser::parseMatchingToken function kicks off a loop that tries to parse an expression list and a closing ')' token. If a closing ')' token isn't found, it calls Parser::diagnose, passing in two arguments: the location of the '(' token, and the diagnostic to print:

swift/lib/Parse/Parser.cpp

 858  /// parseMatchingToken - Parse the specified expected token and return its
 859  /// location on success.  On failure, emit the specified error diagnostic, and a
 860  /// note at the specified note location.
 861  bool Parser::parseMatchingToken(tok K, SourceLoc &TokLoc, Diag<> ErrorDiag,
 862                                  SourceLoc OtherLoc) {
863 Diag<> OtherNote;
864 switch (K) {
865 case tok::r_paren: OtherNote = diag::opening_paren; break;
866 case tok::r_square: OtherNote = diag::opening_bracket; break; 867 case tok::r_brace: OtherNote = diag::opening_brace; break; 868 default: llvm_unreachable("unknown matching token!"); break; 869 } 870 if (parseToken(K, TokLoc, ErrorDiag)) {
871 diagnose(OtherLoc, OtherNote);
872 873 TokLoc = PreviousLoc; 874 return true; 875 } 876 877 return false; 878 }

Notice that the note diagnostic text isn't represented with a string, but instead with a type swift::Diag<>, set to the value diag::opening_paren. This value is defined using macros:

swift/include/swift/AST/DiagnosticsParse.def

34  #ifndef NOTE
35  #  define NOTE(ID,Options,Text,Signature) \
36    DIAG(NOTE,ID,Options,Text,Signature)
37  #endif
..  
47  NOTE(opening_paren,none,
48       "to match this opening '('", ())

Another file defines the DIAG macro and then includes this file, in order to define a new global variable, named swift::diag::opening_paren:

swift/include/swift/AST/DiagnosticsParse.def

23  namespace swift {
24    namespace diag {
25    // Declare common diagnostics objects with their appropriate types.
26  #define DIAG(KIND,ID,Options,Text,Signature) \
27    extern detail::DiagWithArguments<void Signature>::type ID;
28  #include "DiagnosticsParse.def"
29    }
30  }

The extern variable opening_paren is an instance of swift::detail::DiagWithArguments<void ()>::type, which is a type alias for swift::Diag<void ()>. The template parameter void () indicates that the diagnostic takes no arguments.

There are other swift::diag kinds that do take arguments, such as operator_static_in_protocol, which takes a StringRef for the operator name:

swift/include/swift/AST/DiagnosticsParse.def

345  ERROR(operator_static_in_protocol,none,
346        "operator '%0' declared in protocol must be 'static'",
347 (StringRef))
348

In a future article, I'll cover how these arguments are used. For now, I'll focus on opening_paren, which takes no arguments.

As shown above, Parser::parseMatchingToken calls Parser::diagnose with the location of the opening parenthesis token and the diag::opening_paren. This function instantiates a swift::Diagnostic based on the swift::Diag<>, and calls through to swift::DiagnosticEngine::diagnose (recall above that the swift::Parser was instantiated with an instance of swift::DiagnosticEngine, a member of swift::ASTContext):

swift/include/swift/Parse/Parser.h

545    InFlightDiagnostic diagnose(SourceLoc Loc, Diagnostic Diag) {
...  
549      return Diags.diagnose(Loc, Diag);
550    }
...  
556    template<typename ...DiagArgTypes, typename ...ArgTypes>
557    InFlightDiagnostic diagnose(SourceLoc Loc, Diag<DiagArgTypes...> DiagID,
558                                ArgTypes &&...Args) {
559      return diagnose(Loc, Diagnostic(DiagID, std::forward<ArgTypes>(Args)...));
560    }

The Swift compiler defines several abstractions around diagnostics, and it can be hard to keep track of them all. So far I've covered swift::Diag<> and swift::detail::DiagnosticWithArguments<>::type, which represent a single unique diagnostic. The code snippet above introduces swift::Diagnostic, swift::DiagnosticEngine, swift::DiagnosticConsumer, swift::DiagnosticInfo, and swift::InFlightDiagnostic. I'll introduce them each before stepping through the code:

That's a lot of abstraction, but I found that after I followed the code, things became a little clearer. Recall that the swift::Parser::diagnose function above instantiated a swift::Diagnostic for diag::opening_paren, and then called swift::DiagnosticEngine::diagnose:

swift/include/swift/AST/DiagnosticEngine.h

646      /// \brief Emit an already-constructed diagnostic at the given location.
647      ///
648      /// \param Loc The location to which the diagnostic refers in the source
649      /// code.
650      ///
651      /// \param D The diagnostic.
652      ///
653      /// \returns An in-flight diagnostic, to which additional information can
654      /// be attached.
655      InFlightDiagnostic diagnose(SourceLoc Loc, const Diagnostic &D) {
656        assert(!ActiveDiagnostic && "Already have an active diagnostic");
657 ActiveDiagnostic = D;
658 ActiveDiagnostic->setLoc(Loc);
659 return InFlightDiagnostic(*this);
660 }

Here the swift::DiagnosticEngine sets its ActiveDiagnostic to the swift::Diagnostic that it was given. It then instantiates a swift::InFlightDiagnostic and returns it.

Again, the swift::InFlightDiagnostic being returned here isn't actually a "diagnostic." It doesn't hold any information about what text should be printed out or at which location. Its only purpose is to be returned to the caller – the Parser::parseMatchingToken function – in case that caller wants to attach additional fix-its to the swift::DiagnosticEngine::ActiveDiagnostic.

In this case, the Parser::parseMatchingToken function doesn't store the InFlightDiagnostic at all. So it immediately goes out of scope, and its destructor is called:

swift/lib/Parse/Parser.cpp

861  bool Parser::parseMatchingToken(tok K, SourceLoc &TokLoc, Diag<> ErrorDiag,
862                                  SourceLoc OtherLoc) {
...  
870    if (parseToken(K, TokLoc, ErrorDiag)) {
871 diagnose(OtherLoc, OtherNote);
... 878 }

swift/include/swift/AST/DiagnosticEngine.h

393      ~InFlightDiagnostic() {
394        if (IsActive)
395 flush();
396 }

The InFlightDiagnostic::flush member function calls through to DiagnosticEngine::flushActiveDiagnostic, which loops over each diagnostic consumer to call DiagnosticConsumer::handleDiagnostic, passing them a reference to the swift::SourceManager, along with a DiagnosticInfo instance:

swift/lib/AST/DiagnosticEngine.cpp

241  void InFlightDiagnostic::flush() {
...  
247      Engine->flushActiveDiagnostic();
248  }
...  
682  void DiagnosticEngine::flushActiveDiagnostic() {
...  
685      emitDiagnostic(*ActiveDiagnostic);
...  
690  }
...  
699  void DiagnosticEngine::emitDiagnostic(const Diagnostic &diagnostic) {
...  
705    SourceLoc loc = diagnostic.getLoc();
...  
818    // Pass the diagnostic off to the consumer.
819    DiagnosticInfo Info;
820    Info.ID = diagnostic.getID();
821    Info.Ranges = diagnostic.getRanges();
822    Info.FixIts = diagnostic.getFixIts();
823    for (auto &Consumer : Consumers) {
824      Consumer->handleDiagnostic(SourceMgr, loc, toDiagnosticKind(behavior),
825                                 diagnosticStringFor(Info.ID),
826                                 diagnostic.getArgs(), Info);
827    }
828  }

The announcer and listener pattern employed by DiagnosticEngine and its DiagnosticConumer instances allow the Swift compiler to handle diagnostics in different ways. For example, when I invoke swiftc -serialize-diagnostics, libswiftFrontendTool registers a swift::SerializedDiagnosticConsumer instance, which writes diagnostics data to a file.

By default, libswiftFrontendTool registers a swift::PrintingDiagnosticConsumer, in the swift::performFrontend function:

swift/lib/FrontendTool/FrontendTool.cpp

1607  int swift::performFrontend(ArrayRef<const char *> Args,
1608                             const char *Argv0, void *MainAddr,
1609                             FrontendObserver *observer) {
....  
1612    PrintingDiagnosticConsumer PDC;
....  
1648    std::unique_ptr<CompilerInstance> Instance =
1649      llvm::make_unique<CompilerInstance>();
1650    Instance->addDiagnosticConsumer(&PDC);
....  
1824  }

When DiagnosticEngine::emitDiagnostic invokes the PrintingDiagnosticConsumer::handleDiagnostic function, the llvm::SourceMgr functions described in How the Swift Compiler Emits Diagnostics, Part 1 are called. llvm::SourceMgr::GetMessage is used to get the text to print, and llvm::SourceMgr::PrimtMessage is used to output that text to the console:

swift/lib/FrontendTool/FrontendTool.cpp

 66  void PrintingDiagnosticConsumer::handleDiagnostic(
 67      SourceManager &SM, SourceLoc Loc, DiagnosticKind Kind,
 68      StringRef FormatString, ArrayRef<DiagnosticArgument> FormatArgs,
 69      const DiagnosticInfo &Info) {
 70    // Determine what kind of diagnostic we're emitting.
 71    llvm::SourceMgr::DiagKind SMKind;
 72    switch (Kind) {
 73      case DiagnosticKind::Error:
 74        SMKind = llvm::SourceMgr::DK_Error;
 75        break;
 76      case DiagnosticKind::Warning: 
 77        SMKind = llvm::SourceMgr::DK_Warning; 
 78        break;
 79  
 80      case DiagnosticKind::Note:
 81        SMKind = llvm::SourceMgr::DK_Note;
 82        break;
 83  
 84      case DiagnosticKind::Remark:
 85        SMKind = llvm::SourceMgr::DK_Remark;
 86        break;
 87    }
 ..  
106    const llvm::SourceMgr &rawSM = SM.getLLVMSourceMgr();
107    
108    // Actually substitute the diagnostic arguments into the diagnostic text.
109    llvm::SmallString<256> Text;
110    {
111      llvm::raw_svector_ostream Out(Text);
112      DiagnosticEngine::formatDiagnosticText(Out, FormatString, FormatArgs);
113    }
114    
115    auto Msg = SM.GetMessage(Loc, SMKind, Text, Ranges, FixIts);
116 rawSM.PrintMessage(out, Msg);
117 }

Summarizing the last three articles

There's a lot of different machinery involved in printing diagnostics, spanning both the Swift and LLVM codebases:

  1. The Swift frontend uses libLLVMOption to parse command-line arguments. The frontend instantiates an llvm::MemoryBuffer for each command-line argument that is determined to be an input file path. An llvm::MemoryBuffer either reads the entire file into memory, or it uses the operating system call mmap to read chunks of it into memory as needed.
  2. The Swift frontend registers each instantiated llvm::MemoryBuffer with a swift::SourceManager. This is a libswiftBasic wrapper around llvm::SourceMgr. libLLVMSupport implements llvm::SourceMgr, which defines logic to print a line of source code from an llvm::MemoryBuffer, along with carets ^ or underlines ~~~~~ at specific locations or ranges, plus some arbitrary text.
  3. The Swift lexer records swift::SourceLoc locations as attributes on the tokens it lexes. swift::SourceLoc is defined in libswiftBasic, as a wrapper around llvm::SMLoc.
  4. The Swift parser is instantiated with a swift::DiagnosticEngine, which in turn holds a reference to the swift::SourceManager that owns the llvm::MemoryBuffer objects. When the Swift parser encounters a series of tokens that don't fit Swift's grammar rules, it instructs its swift::DiagnosticEngine to print a diagnostic.
  5. The swift::DiagnosticEngine uses an announcer & listener pattern to call handleDiagnostic on each of its swift::DiagnosticConsumer instances, passing them a reference to the swift::SourceManager and some information about the diagnostic.
  6. The default diagnostic consumer is swift::PrintingDiagnosticConsumer, which calls llvm::SourceMgr::PrintMessage in order to print the diagnostic text to the console.