How the Swift Compiler Emits Diagnostics, Part 1: LLVM Abstractions

The prior article in this series explained how the Swift compiler lexes and parses source code, by looking at this syntactically valid Swift program:

hello.swift

1  // hello.swift
2
3  print("Hello, world!")

In this and the next few articles, I'd like to take a look at this ill-formed Swift program (note the extra opening parenthesis):

uhoh.swift

1  // uhoh.swift
2
3  print(("Yikes.")

Compiling this program results in two diagnostics: one error diagnostic, and one note diagnostic:

swiftc uhoh.swift
uhoh.swift:3:17: error: expected ')' in expression list
print(("Yikes.")
                ^
uhoh.swift:3:6: note: to match this opening '('
print(("Yikes.")
     ^

As I explained in the previous article, the Swift parser instructs its lexer to lex tokens in the source program. It inspects each token, and determines what tokens need to be lexed next in order for the program to be valid. This is what I imagine the parser is "thinking" when it parses the program above:

  1. "I instruct the lexer to lex the first token in the source code. The lexer returns to me that token, which is print. I know that this is an identifier. I need to keep parsing to determine whether this identifier is being used as a variable, as part of a function call, or in some other way. To find out, I'll instruct the lexer to lex another token."
  2. "The lexer has lexed the token (, an opening parenthesis. I am probably parsing a function call, and as such I expect to find a list of zero or more expressions – the function arguments – followed by a closing parenthesis ). I'll instruct the lexer to lex another token."
  3. "The lexer has lexed the token (, another opening parenthesis. OK, so maybe there's a tuple argument being passed in as a function argument, such as print((true, 1)) or something." (Keep in mind that the parser doesn't know anything about whether the print function provided by the Swift standard library can take a tuple as an argument – that's something for the Swift type-checker, in libswiftSema, to determine.) "So now I need to parse another list of zero or more expressions to fit inside this tuple or whatever, and then find two closing parentheses: one for the print(, and one for the print((. I'll instruct the lexer to lex another token."
  4. "The lexer has lexed the string literal token "Yikes.". OK, I guess this is the first expression inside the tuple. Let's keep going: I instruct the lexer to lex another token."
  5. "The lexer has lexed the token ), a closing parenthesis. Aha, so it looks like I've parsed a tuple: ("Yikes."). OK, so that's the first expression in a list of expressions being passed as arguments to print(. I need to keep parsing expressions in that list of arguments, and then find a closing parenthesis ) to end the list. I instruct the lexer to lex another token."
  6. "The lexer has lexed the token… eof?? End of file? What the heck? Where's the closing parenthesis for print(? OK, this source code is messed up. I should let the user know by emitting a diagnostic."

I explained the mechanics of steps 1 through 5 above in the previous article. In this article, I'll skim over those steps and focus instead on the diagnostic output itself:

swiftc uhoh.swift
uhoh.swift:3:17: error: expected ')' in expression list
print(("Yikes.")
                ^
uhoh.swift:3:6: note: to match this opening '('
print(("Yikes.")
     ^

How does the parser know where to place the ^ carets in the output above? How is the text actually printed? Where is that logic defined?

It turns out the answer to these questions span both the Swift and LLVM codebases.

Representing source text and locations, and emitting diagnostics, using libLLVMSupport

You may recall from my article on Option Parsing in the Swift Compiler that the apple/swift codebase makes use of LLVM libraries. Specifically, that article described how the actual logic of parsing command-line strings was handled in libLLVMOption.

Likewise, the actual logic involved in storing source program text in memory, representing locations in that source text, and printing diagnostic messages, is handled in the LLVM library libLLVMSupport.

Here's a small sample program that demonstrates how libLLVMSupport can be used to emit a diagnostic:

libLLVMSupport-Example.cpp

 1  #include "llvm/Support/MemoryBuffer.h"
 2  #include "llvm/Support/SourceMgr.h"
 3  
 4  int main() {
 5    // This string will represent our source program.
 6    llvm::StringRef Input =
 7      "func foo() {\n"
 8      "  print(\"Hello!\")\n"
 9      "}";
10  
11    // The llvm::MemoryBuffer class is used to store
12    // large strings, along with metadata such as a
13    // buffer or file name. Here we instantiate a
14    // llvm::MemoryBuffer to store the contents of
15    // our source program.
16    std::unique_ptr<llvm::MemoryBuffer> InputBuffer =
17      llvm::MemoryBuffer::getMemBuffer(Input);
18  
19    // The llvm::SourceMgr class is used to emit
20    // diagnostics for one or more llvm::MemoryBuffer
21    // instances. Here we instantiate a new
22    // llvm::SourceMgr and transfer ownership of our
23    // input buffer over to it.
24    llvm::SourceMgr SourceManager;
25    SourceManager.AddNewSourceBuffer(
26        std::move(InputBuffer),
27        /*IncludeLoc*/ llvm::SMLoc());
28  
29    // Here we grab a pointer into the buffer.
30    // Incrementing and decrementing this pointer
31    // allows us to traverse the source program.
32    const llvm::MemoryBuffer *SourceBuffer =
33      SourceManager.getMemoryBuffer(1);
34    const char *CurrentCharacter =
35      SourceBuffer->getBufferStart();
36  
37    // The llvm::SMLoc class is used to represent a
38    // location in an llvm::MemoryBuffer that is managed
39    // by llvm::SourceMgr. We instantiate an llvm::SMLoc
40    // here, for the starting location.
41    llvm::SMLoc BufferStartLocation =
42      llvm::SMLoc::getFromPointer(CurrentCharacter);
43  
44    // The llvm::SourceMgr::PrintMessage function allows
45    // us to print a caret ^ at a specific llvm::SMLoc
46    // location.
47    SourceManager.PrintMessage(
48        BufferStartLocation,
49        llvm::SourceMgr::DiagKind::DK_Remark,
50        "This is the very beginning of the "
51        "source buffer.");
..
96    return 0;
97  }

Running the above program results in the following output:

remark: This is the very beginning of the source buffer.
func foo() {
^

By incrementing the buffer pointer, I can print diagnostics at other locations as well. In the expanded example below, I use llvm::SMRange to print a warning that points to the string "Hello!" in the source program:

libLLVMSupport-Example.cpp

 1  #include "llvm/Support/MemoryBuffer.h"
 2  #include "llvm/Support/SourceMgr.h"
 3  
 4  int main() {
..  
53    // Let's increment our buffer pointer until we find
54    // the first quotation mark character: the first "
55    // in the source text line 'print("Hello!")'. Then
56    // let's record that in an llvm::SMLoc location.
57    while (*CurrentCharacter != '"')
58      ++CurrentCharacter;
59    llvm::SMLoc StartLocation =
60      llvm::SMLoc::getFromPointer(CurrentCharacter);
61  
62    // Next, let's get the llvm::SMLoc location
63    // representing the end of the string "Hello!",
64    // by finding the first character past the last
65    // quotation mark.
66    while (*CurrentCharacter != ')')
67      ++CurrentCharacter;
68    llvm::SMLoc EndLocation =
69      llvm::SMLoc::getFromPointer(CurrentCharacter);
70  
71    // The llvm::SMRange class represents a range: a
72    // beginning and an end llvm::SMLoc location.
73    llvm::SMRange Range = llvm::SMRange(StartLocation,
74                                        EndLocation);
75  
76    // We can print a warning that points to this
77    // llvm::SMRange range.
78    SourceManager.PrintMessage(
79        StartLocation,
80        llvm::SourceMgr::DiagKind::DK_Warning,
81        "This is the range of source text in which "
82        "a string literal appears.",
83        Range);
..
96    return 0;
97  }

Now running the program results in two diagnostics being emitted:

remark: This is the very beginning of the source buffer.
func foo() {
^
warning: This is the range of source text in which a string literal appears.
  print("Hello!")
        ^~~~~~~~

And I can use llvm::SMFixIt in order to display a suggestion for a string to replace "Hello!":

libLLVMSupport-Example.cpp

 1  #include "llvm/Support/MemoryBuffer.h"
 2  #include "llvm/Support/SourceMgr.h"
 3  
 4  int main() {
..
78    SourceManager.PrintMessage(
79        StartLocation,
80        llvm::SourceMgr::DiagKind::DK_Warning,
81        "This is the range of source text in which "
82        "a string literal appears.",
83        Range);
84  
85    // The llvm::SMFixIt class allows us to print
86    // a replacement suggestion underneath the
87    // caret ^ output.
88    SourceManager.PrintMessage(
89        StartLocation,
90        llvm::SourceMgr::DiagKind::DK_Note,
91        "This is a fix-it that suggests an "
92        "alternative string.",
93        llvm::None,
94        llvm::SMFixIt(Range, "\"Good-bye!\""));
95
96    return 0;
97  }

The text associated with the llvm::SMFixIt is displayed below the underline:

remark: This is the very beginning of the source buffer.
func foo() {
^
warning: This is the range of source text in which a string literal appears.
  print("Hello!")
        ^~~~~~~~
note: This is a fix-it that suggests an alternative string.
  print("Hello!")
        ^~~~~~~~
        "Good-bye!"

llvm::SourceMgr also has some rudimentary support for emitting diagnostics for files included by other files. For example, I can mark our source buffer as having been included by another buffer, like so:

libLLVMSupport-Example.cpp

  1  #include "llvm/Support/MemoryBuffer.h"
  2  #include "llvm/Support/SourceMgr.h"
  3  
  4  int main() {
+ llvm::StringRef IncludeInput = "import foo";
+ llvm::SourceMgr SourceManager;
+ SourceManager.AddNewSourceBuffer(
+ std::move(
+ llvm::MemoryBuffer::getMemBuffer(IncludeInput)),
++ /*IncludeLoc*/ llvm::SMLoc());
++ llvm::SMLoc IncludeLocation =
++ llvm::SMLoc::getFromPointer(
++ SourceManager.getMemoryBuffer(1)->getBufferStart());
14 15 // This string will represent our source program. 16 llvm::StringRef Input = 17 "func foo() {\n" 18 " print(\"Hello!\")\n" 19 "}"; 20 21 // The llvm::MemoryBuffer class is used to store 22 // large strings, along with metadata such as a 23 // buffer or file name. Here we instantiate a 24 // llvm::MemoryBuffer to store the contents of 25 // our source program. 26 std::unique_ptr<llvm::MemoryBuffer> InputBuffer = 27 llvm::MemoryBuffer::getMemBuffer(Input); 28 29 // The llvm::SourceMgr class is used to emit 30 // diagnostics for one or more llvm::MemoryBuffer 31 // instances. Here we instantiate a new 32 // llvm::SourceMgr and transfer ownership of our 33 // input buffer over to it.
-- llvm::SourceMgr SourceManager;
34 SourceManager.AddNewSourceBuffer( 35 std::move(InputBuffer),
-- /*IncludeLoc*/ llvm::SMLoc());
++ /*IncludeLoc*/ IncludeLocation);
37 38 // Here we grab a pointer into the buffer. 39 // Incrementing and decrementing this pointer 40 // allows us to traverse the source program. 41 const llvm::MemoryBuffer *SourceBuffer =
-- SourceManager.getMemoryBuffer(1);
++ SourceManager.getMemoryBuffer(2);
43 const char *CurrentCharacter = 44 SourceBuffer->getBufferStart(); ... 106 }

Now the diagnostics emitted for each llvm::SMLoc location also include a line indicating where those locations were "included" from:

Included from :1:
remark: This is the very beginning of the source buffer.
func foo() {
^
Included from :1:
warning: This is the range of source text in which a string literal appears.
  print("Hello!")
        ^~~~~~~~
Included from :1:
note: This is a fix-it that suggests an alternative string.
  print("Hello!")
        ^~~~~~~~
        "Good-bye!"

Since our source buffers are not named, llvm::SourceMgr just outputs the line on which the "include" occurs – in our case, the first line in the buffer, 1 – and so the output is "Included from :1:".

Assigning identifers to the llvm::MemoryBuffer improves the output:

libLLVMSupport-Example.cpp

  4  int main() {
  5    llvm::StringRef IncludeInput = "import foo";
  6    llvm::SourceMgr SourceManager;
  7    SourceManager.AddNewSourceBuffer(
  8       std::move(
- llvm::MemoryBuffer::getMemBuffer(IncludeInput)),
+ llvm::MemoryBuffer::getMemBuffer(IncludeInput,,
++ "include.swift")),
11 /*IncludeLoc*/ llvm::SMLoc()); .. 27 std::unique_ptr<llvm::MemoryBuffer> InputBuffer =
-- llvm::MemoryBuffer::getMemBuffer(Input);
++ llvm::MemoryBuffer::getMemBuffer(Input, "foo.swift");
... 107 }

Now the output includes the file names:

Included from include.swift:1:
foo.swift:1:1: remark: This is the very beginning of the source buffer.
func foo() {
^
Included from include.swift:1:
foo.swift:2:9: warning: This is the range of source text in which a string literal appears.
  print("Hello!")
        ^~~~~~~~
Included from include.swift:1:
foo.swift:2:9: note: This is a fix-it that suggests an alternative string.
  print("Hello!")
        ^~~~~~~~
        "Good-bye!"

The diagnostics are pretty much identical to Swift's (and Clang's, the C/C++/Objective-C compiler), because the apple/swift codebase uses the same llvm::MemoryBuffer I used above in order to store source text in memory, and it uses the same llvm::SourceMgr and llvm::SMLoc to emit its diagnostics.

How exactly do llvm::SourceMgr and llvm::SMLoc work?

llvm::SourceMgr owns a vector of memory buffers – the ones added via calls to llvm::SourceMgr::AddNewSourceBuffer. The SourceMgr.h header file also includes:

llvm::SourceMgr also stores a vector of "include directories," but I'm not quite sure why yet. As far as I can tell, it's only used by one or two internal LLVM tools, like TableGen (remember llvm-tblgen? I wrote about it in this article).

llvm/include/llvm/Support/SourceMgr.h

 40  /// This owns the files read by a parser, handles include stacks,
 41  /// and handles diagnostic wrangling.
 42  class SourceMgr {
 43  public:
 44    enum DiagKind {
 45      DK_Error,
 46      DK_Warning,
 47      DK_Remark,
 48      DK_Note,
 49    };
 ..  
 56  private:
 57    struct SrcBuffer {
 58      /// The memory buffer for the file.
 59      std::unique_ptr<MemoryBuffer> Buffer;
 ..
 85      /// This is the location of the parent include, or null if at the top level.
 86      SMLoc IncludeLoc;
 87  
 88      SrcBuffer() = default;
 ..
 93    };
 94  
 95    /// This is all of the buffers that we are reading from.
 96    std::vector<SrcBuffer> Buffers;
 97  
 98    // This is the list of directories we should search for include files in.
 99    std::vector<std::string> IncludeDirectories;
...  
104    bool isValidBufferID(unsigned i) const { return i && i <= Buffers.size(); }
105  
106  public:
107    SourceMgr() = default;
...
131    const MemoryBuffer *getMemoryBuffer(unsigned i) const {
132      assert(isValidBufferID(i));
133      return Buffers[i - 1].Buffer.get();
134    }
135  
136    unsigned getNumBuffers() const {
137      return Buffers.size();
138    }
139  
140    unsigned getMainFileID() const {
141      assert(getNumBuffers());
142      return 1;
143    }
...
150    /// Add a new source buffer to this source manager. This takes ownership of
151    /// the memory buffer.
152    unsigned AddNewSourceBuffer(std::unique_ptr<MemoryBuffer> F,
153                                SMLoc IncludeLoc) {
154      SrcBuffer NB;
155      NB.Buffer = std::move(F);
156      NB.IncludeLoc = IncludeLoc;
157      Buffers.push_back(std::move(NB));
158      return Buffers.size();
159    }
...
225  };

The llvm::SourceMgr header file also declares the llvm::SourceMgr::PrintMessage member function, which I used above to print diagnostics. This is a convenience function that creates an llvm::SMDiagnostic instance, and then calls the llvm::SMDiagnostic::print function.

llvm/include/llvm/Support/SourceMgr.h

 42  class SourceMgr {
...
186    /// Emit a message about the specified location with the specified string.
187    ///
188    /// \param ShowColors Display colored messages if output is a terminal and
189    /// the default error handler is used.
190    void PrintMessage(raw_ostream &OS, SMLoc Loc, DiagKind Kind,
191                      const Twine &Msg,
192                      ArrayRef<SMRange> Ranges = None,
193                      ArrayRef<SMFixIt> FixIts = None,
194                      bool ShowColors = true) const;
195  
196    /// Emits a diagnostic to llvm::errs().
197    void PrintMessage(SMLoc Loc, DiagKind Kind, const Twine &Msg,
198                      ArrayRef<SMRange> Ranges = None,
199                      ArrayRef<SMFixIt> FixIts = None,
200                      bool ShowColors = true) const;
201  
202    /// Emits a manually-constructed diagnostic to the given output stream.
203    ///
204    /// \param ShowColors Display colored messages if output is a terminal and
205    /// the default error handler is used.
206    void PrintMessage(raw_ostream &OS, const SMDiagnostic &Diagnostic,
207                      bool ShowColors = true) const;
...
225  };

llvm/lib/Support/SourceMgr.cpp

231  void SourceMgr::PrintMessage(raw_ostream &OS, const SMDiagnostic &Diagnostic,
232                               bool ShowColors) const {
...  
239    if (Diagnostic.getLoc().isValid()) {
240      unsigned CurBuf = FindBufferContainingLoc(Diagnostic.getLoc());
241      assert(CurBuf && "Invalid or unspecified location!");
242      PrintIncludeStack(getBufferInfo(CurBuf).IncludeLoc, OS);
243    }
244  
245    Diagnostic.print(nullptr, OS, ShowColors);
246  }
247  
248  void SourceMgr::PrintMessage(raw_ostream &OS, SMLoc Loc,
249                               SourceMgr::DiagKind Kind,
250                               const Twine &Msg, ArrayRef<SMRange> Ranges,
251                               ArrayRef<SMFixIt> FixIts, bool ShowColors) const {
252    PrintMessage(OS, GetMessage(Loc, Kind, Msg, Ranges, FixIts), ShowColors);
253  }
254  
255  void SourceMgr::PrintMessage(SMLoc Loc, SourceMgr::DiagKind Kind,
256                               const Twine &Msg, ArrayRef<SMRange> Ranges,
257                               ArrayRef<SMFixIt> FixIts, bool ShowColors) const {
258    PrintMessage(errs(), Loc, Kind, Msg, Ranges, FixIts, ShowColors);
259  }

The llvm::SMDiagnostic class is also declared in the llvm::SourceMgr header file. It simply stores the data necessary when printing a diagnostic, and it declares the llvm::SMDiagnostic::print member function. This header also declares llvm::SMFixIt, which is also a simple bag of data:

llvm/include/llvm/Support/SourceMgr.h

227  /// Represents a single fixit, a replacement of one range of text with another.
228  class SMFixIt {
229    SMRange Range;
230  
231    std::string Text;
...
256  };
257  
258  /// Instances of this class encapsulate one diagnostic report, allowing
259  /// printing to a raw_ostream as a caret diagnostic.
260  class SMDiagnostic {
261    const SourceMgr *SM = nullptr;
262    SMLoc Loc;
263    std::string Filename;
264    int LineNo = 0;
265    int ColumnNo = 0;
266    SourceMgr::DiagKind Kind = SourceMgr::DK_Error;
267    std::string Message, LineContents;
268    std::vector<std::pair<unsigned, unsigned>> Ranges;
269    SmallVector<SMFixIt, 4> FixIts;
270  
271  public:
...  
278    // Diagnostic with a location.
279    SMDiagnostic(const SourceMgr &sm, SMLoc L, StringRef FN,
280                 int Line, int Col, SourceMgr::DiagKind Kind,
281                 StringRef Msg, StringRef LineStr,
282                 ArrayRef<std::pair<unsigned,unsigned>> Ranges,
283                 ArrayRef<SMFixIt> FixIts = None);
...  
303    void print(const char *ProgName, raw_ostream &S, bool ShowColors = true,
304               bool ShowKindLabel = true) const;
305  };

Note that llvm::SMDiagnostic is initialized with not just the std::string message that will be printed, but also the std::string source line to print as well. The llvm::SourceMgr::GetMessage member function, which is called when llvm::SourceMgr::PrintMessage is used to emit a diagnostic, grabs the source line, called LineStr below, by searching for newline characters in the source buffer.

llvm/lib/Support/SourceMgr.cpp

169  SMDiagnostic SourceMgr::GetMessage(SMLoc Loc, SourceMgr::DiagKind Kind,
170                                     const Twine &Msg,
171                                     ArrayRef<SMRange> Ranges,
172                                     ArrayRef<SMFixIt> FixIts) const {
...
187      // Scan backward to find the start of the line.
188      const char *LineStart = Loc.getPointer();
189      const char *BufStart = CurMB->getBufferStart();
190      while (LineStart != BufStart && LineStart[-1] != '\n' &&
191             LineStart[-1] != '\r')
192        --LineStart;
193  
194      // Get the end of the line.
195      const char *LineEnd = Loc.getPointer();
196      const char *BufEnd = CurMB->getBufferEnd();
197      while (LineEnd != BufEnd && LineEnd[0] != '\n' && LineEnd[0] != '\r')
198        ++LineEnd;
199      LineStr = std::string(LineStart, LineEnd);
...    
226    return SMDiagnostic(*this, Loc, BufferID, LineAndCol.first,
227                        LineAndCol.second-1, Kind, Msg.str(),
228                        LineStr, ColRanges, FixIts);
229  }

The implementation of the llvm::SMDiagnostic::print function is what actually prints textual output (by default to llvm::errs, which is just std::cerr, also known as STDERR). It's a fairly simple function, composed mostly of if and print statements:

llvm/lib/Support/SourceMgr.cpp

368  void SMDiagnostic::print(const char *ProgName, raw_ostream &S, bool ShowColors,
369                           bool ShowKindLabel) const {
...  
376    if (ProgName && ProgName[0])
377      S << ProgName << ": ";
378  
379    if (!Filename.empty()) {
380      if (Filename == "-")
381        S << "<stdin>";
382      else
383        S << Filename;
384  
385      if (LineNo != -1) {
386        S << ':' << LineNo;
387        if (ColumnNo != -1)
388          S << ':' << (ColumnNo+1);
389      }
390      S << ": ";
391    }
392  
393    if (ShowKindLabel) {
394      switch (Kind) {
395      case SourceMgr::DK_Error:
...
398        S << "error: ";
399        break;
400      case SourceMgr::DK_Warning:
...
403        S << "warning: ";
404        break;
405      case SourceMgr::DK_Note:
...
408        S << "note: ";
409        break;
410      case SourceMgr::DK_Remark:
...
413        S << "remark: ";
414        break;
415      }
...
421    }
422
423    S << Message << '\n';
...
442    // Build the line with the caret and ranges.
443    std::string CaretLine(NumColumns+1, ' ');
444    
445    // Expand any ranges.
446    for (unsigned r = 0, e = Ranges.size(); r != e; ++r) {
447      std::pair<unsigned, unsigned> R = Ranges[r];
448      std::fill(&CaretLine[R.first],
449                &CaretLine[std::min((size_t)R.second, CaretLine.size())],
450                '~');
451    }
452
453    // Add any fix-its.
...
455    std::string FixItInsertionLine;
456    buildFixItLine(CaretLine, FixItInsertionLine, FixIts,
457                   makeArrayRef(Loc.getPointer() - ColumnNo,
458                                LineContents.size()));
459
460    // Finally, plop on the caret.
461    if (unsigned(ColumnNo) <= NumColumns)
462      CaretLine[ColumnNo] = '^';
463    else 
464      CaretLine[NumColumns] = '^';
465
466    // ... and remove trailing whitespace so the output doesn't wrap for it.  We
467    // know that the line isn't completely empty because it has the caret in it at
468    // least.
469    CaretLine.erase(CaretLine.find_last_not_of(' ')+1);
470    
471    printSourceLine(S, LineContents);
...
519    S << '\n';
520  }

What I wrote about above represents pretty much everything libLLVMSupport provides in the way of emitting diagnostics. I enjoy reading code like this in libLLVMSupport because it demystifies things that I would otherwise take for granted. The functionality in llvm::SourceMgr and llvm::SMDiagnostic really boils down to storing strings in memory and printing to the console, but for some reason as a compiler user I had always assumed there was something more magical going on behind the scenes. Maybe you felt similarly after reading this article.

I did, however, leave out some details, such as the internals of llvm::MemoryBuffer, and some of the complex caching logic that speeds up functions like llvm::SourceMgr::getLineAndColumn. The former is especially interesting and I hope to write about it more someday – take a look yourself if you have the chance, by reading through llvm/include/llvm/Support/MemoryBuffer.h and llvm/lib/Support/MemoryBuffer.cpp.

In the next article, I'll explain more about how the apple/swift codebase wraps llvm::SourceMgr with the swift::SourceManager class, how swift::CompilerInstance instantiates a llvm::MemoryBuffer for each source file, and about other Swift-specific abstractions, such as swift::DiagnosticEngine.