How the Swift Compiler Emits Diagnostics, Part 1: LLVM Abstractions
The prior article in this series explained how the Swift compiler lexes and parses source code, by looking at this syntactically valid Swift program:
hello.swift
1 // hello.swift 2 3 print("Hello, world!")
In this and the next few articles, I'd like to take a look at this ill-formed Swift program (note the extra opening parenthesis):
uhoh.swift
1 // uhoh.swift 2 3 print(("Yikes.")
Compiling this program results in two diagnostics: one error diagnostic, and one note diagnostic:
swiftc uhoh.swift uhoh.swift:3:17: error: expected ')' in expression list print(("Yikes.") ^ uhoh.swift:3:6: note: to match this opening '(' print(("Yikes.") ^
As I explained in the previous article, the Swift parser instructs its lexer to lex tokens in the source program. It inspects each token, and determines what tokens need to be lexed next in order for the program to be valid. This is what I imagine the parser is "thinking" when it parses the program above:
- "I instruct the lexer to lex the first token in the source code. The lexer returns to me that token, which is
print
. I know that this is an identifier. I need to keep parsing to determine whether this identifier is being used as a variable, as part of a function call, or in some other way. To find out, I'll instruct the lexer to lex another token." - "The lexer has lexed the token
(
, an opening parenthesis. I am probably parsing a function call, and as such I expect to find a list of zero or more expressions – the function arguments – followed by a closing parenthesis)
. I'll instruct the lexer to lex another token." - "The lexer has lexed the token
(
, another opening parenthesis. OK, so maybe there's a tuple argument being passed in as a function argument, such asprint((true, 1))
or something." (Keep in mind that the parser doesn't know anything about whether theprint
function provided by the Swift standard library can take a tuple as an argument – that's something for the Swift type-checker, in libswiftSema, to determine.) "So now I need to parse another list of zero or more expressions to fit inside this tuple or whatever, and then find two closing parentheses: one for theprint(
, and one for theprint((
. I'll instruct the lexer to lex another token." - "The lexer has lexed the string literal token
"Yikes."
. OK, I guess this is the first expression inside the tuple. Let's keep going: I instruct the lexer to lex another token." - "The lexer has lexed the token
)
, a closing parenthesis. Aha, so it looks like I've parsed a tuple:("Yikes.")
. OK, so that's the first expression in a list of expressions being passed as arguments toprint(
. I need to keep parsing expressions in that list of arguments, and then find a closing parenthesis)
to end the list. I instruct the lexer to lex another token." - "The lexer has lexed the token…
eof
?? End of file? What the heck? Where's the closing parenthesis forprint(
? OK, this source code is messed up. I should let the user know by emitting a diagnostic."
I explained the mechanics of steps 1 through 5 above in the previous article. In this article, I'll skim over those steps and focus instead on the diagnostic output itself:
swiftc uhoh.swift uhoh.swift:3:17: error: expected ')' in expression list print(("Yikes.") ^ uhoh.swift:3:6: note: to match this opening '(' print(("Yikes.") ^
How does the parser know where to place the ^
carets in the output above? How is the text actually printed? Where is that logic defined?
It turns out the answer to these questions span both the Swift and LLVM codebases.
Representing source text and locations, and emitting diagnostics, using libLLVMSupport
You may recall from my article on Option Parsing in the Swift Compiler that the apple/swift codebase makes use of LLVM libraries. Specifically, that article described how the actual logic of parsing command-line strings was handled in libLLVMOption.
Likewise, the actual logic involved in storing source program text in memory, representing locations in that source text, and printing diagnostic messages, is handled in the LLVM library libLLVMSupport.
Here's a small sample program that demonstrates how libLLVMSupport can be used to emit a diagnostic:
libLLVMSupport-Example.cpp
1 #include "llvm/Support/MemoryBuffer.h" 2 #include "llvm/Support/SourceMgr.h" 3 4 int main() { 5 // This string will represent our source program. 6 llvm::StringRef Input = 7 "func foo() {\n" 8 " print(\"Hello!\")\n" 9 "}"; 10 11 // The llvm::MemoryBuffer class is used to store 12 // large strings, along with metadata such as a 13 // buffer or file name. Here we instantiate a 14 // llvm::MemoryBuffer to store the contents of 15 // our source program. 16 std::unique_ptr<llvm::MemoryBuffer> InputBuffer = 17 llvm::MemoryBuffer::getMemBuffer(Input); 18 19 // The llvm::SourceMgr class is used to emit 20 // diagnostics for one or more llvm::MemoryBuffer 21 // instances. Here we instantiate a new 22 // llvm::SourceMgr and transfer ownership of our 23 // input buffer over to it. 24 llvm::SourceMgr SourceManager; 25 SourceManager.AddNewSourceBuffer( 26 std::move(InputBuffer), 27 /*IncludeLoc*/ llvm::SMLoc()); 28 29 // Here we grab a pointer into the buffer. 30 // Incrementing and decrementing this pointer 31 // allows us to traverse the source program. 32 const llvm::MemoryBuffer *SourceBuffer = 33 SourceManager.getMemoryBuffer(1); 34 const char *CurrentCharacter = 35 SourceBuffer->getBufferStart(); 36 37 // The llvm::SMLoc class is used to represent a 38 // location in an llvm::MemoryBuffer that is managed 39 // by llvm::SourceMgr. We instantiate an llvm::SMLoc 40 // here, for the starting location. 41 llvm::SMLoc BufferStartLocation = 42 llvm::SMLoc::getFromPointer(CurrentCharacter); 43 44 // The llvm::SourceMgr::PrintMessage function allows 45 // us to print a caret ^ at a specific llvm::SMLoc 46 // location. 47 SourceManager.PrintMessage( 48 BufferStartLocation, 49 llvm::SourceMgr::DiagKind::DK_Remark, 50 "This is the very beginning of the " 51 "source buffer."); .. 96 return 0; 97 }
Running the above program results in the following output:
remark: This is the very beginning of the source buffer. func foo() { ^
By incrementing the buffer pointer, I can print diagnostics at other locations as well. In the expanded example below, I use llvm::SMRange
to print a warning that points to the string "Hello!"
in the source program:
libLLVMSupport-Example.cpp
1 #include "llvm/Support/MemoryBuffer.h" 2 #include "llvm/Support/SourceMgr.h" 3 4 int main() { .. 53 // Let's increment our buffer pointer until we find 54 // the first quotation mark character: the first " 55 // in the source text line 'print("Hello!")'. Then 56 // let's record that in an llvm::SMLoc location. 57 while (*CurrentCharacter != '"') 58 ++CurrentCharacter; 59 llvm::SMLoc StartLocation = 60 llvm::SMLoc::getFromPointer(CurrentCharacter); 61 62 // Next, let's get the llvm::SMLoc location 63 // representing the end of the string "Hello!", 64 // by finding the first character past the last 65 // quotation mark. 66 while (*CurrentCharacter != ')') 67 ++CurrentCharacter; 68 llvm::SMLoc EndLocation = 69 llvm::SMLoc::getFromPointer(CurrentCharacter); 70 71 // The llvm::SMRange class represents a range: a 72 // beginning and an end llvm::SMLoc location. 73 llvm::SMRange Range = llvm::SMRange(StartLocation, 74 EndLocation); 75 76 // We can print a warning that points to this 77 // llvm::SMRange range. 78 SourceManager.PrintMessage( 79 StartLocation, 80 llvm::SourceMgr::DiagKind::DK_Warning, 81 "This is the range of source text in which " 82 "a string literal appears.", 83 Range); .. 96 return 0; 97 }
Now running the program results in two diagnostics being emitted:
remark: This is the very beginning of the source buffer. func foo() { ^ warning: This is the range of source text in which a string literal appears. print("Hello!") ^~~~~~~~
And I can use llvm::SMFixIt
in order to display a suggestion for a string to replace "Hello!"
:
libLLVMSupport-Example.cpp
1 #include "llvm/Support/MemoryBuffer.h" 2 #include "llvm/Support/SourceMgr.h" 3 4 int main() { .. 78 SourceManager.PrintMessage( 79 StartLocation, 80 llvm::SourceMgr::DiagKind::DK_Warning, 81 "This is the range of source text in which " 82 "a string literal appears.", 83 Range); 84 85 // The llvm::SMFixIt class allows us to print 86 // a replacement suggestion underneath the 87 // caret ^ output. 88 SourceManager.PrintMessage( 89 StartLocation, 90 llvm::SourceMgr::DiagKind::DK_Note, 91 "This is a fix-it that suggests an " 92 "alternative string.", 93 llvm::None, 94 llvm::SMFixIt(Range, "\"Good-bye!\"")); 95 96 return 0; 97 }
The text associated with the llvm::SMFixIt
is displayed below the underline:
remark: This is the very beginning of the source buffer. func foo() { ^ warning: This is the range of source text in which a string literal appears. print("Hello!") ^~~~~~~~ note: This is a fix-it that suggests an alternative string. print("Hello!") ^~~~~~~~ "Good-bye!"
llvm::SourceMgr
also has some rudimentary support for emitting diagnostics for files included by other files. For example, I can mark our source buffer as having been included by another buffer, like so:
libLLVMSupport-Example.cpp
1 #include "llvm/Support/MemoryBuffer.h" 2 #include "llvm/Support/SourceMgr.h" 3 4 int main() {+ llvm::StringRef IncludeInput = "import foo";+ llvm::SourceMgr SourceManager;+ SourceManager.AddNewSourceBuffer(+ std::move(+ llvm::MemoryBuffer::getMemBuffer(IncludeInput)),++ /*IncludeLoc*/ llvm::SMLoc());++ llvm::SMLoc IncludeLocation =++ llvm::SMLoc::getFromPointer(++ SourceManager.getMemoryBuffer(1)->getBufferStart());14 15 // This string will represent our source program. 16 llvm::StringRef Input = 17 "func foo() {\n" 18 " print(\"Hello!\")\n" 19 "}"; 20 21 // The llvm::MemoryBuffer class is used to store 22 // large strings, along with metadata such as a 23 // buffer or file name. Here we instantiate a 24 // llvm::MemoryBuffer to store the contents of 25 // our source program. 26 std::unique_ptr<llvm::MemoryBuffer> InputBuffer = 27 llvm::MemoryBuffer::getMemBuffer(Input); 28 29 // The llvm::SourceMgr class is used to emit 30 // diagnostics for one or more llvm::MemoryBuffer 31 // instances. Here we instantiate a new 32 // llvm::SourceMgr and transfer ownership of our 33 // input buffer over to it.-- llvm::SourceMgr SourceManager;34 SourceManager.AddNewSourceBuffer( 35 std::move(InputBuffer),-- /*IncludeLoc*/ llvm::SMLoc());++ /*IncludeLoc*/ IncludeLocation);37 38 // Here we grab a pointer into the buffer. 39 // Incrementing and decrementing this pointer 40 // allows us to traverse the source program. 41 const llvm::MemoryBuffer *SourceBuffer =-- SourceManager.getMemoryBuffer(1);++ SourceManager.getMemoryBuffer(2);43 const char *CurrentCharacter = 44 SourceBuffer->getBufferStart(); ... 106 }
Now the diagnostics emitted for each llvm::SMLoc
location also include a line indicating where those locations were "included" from:
Included from :1: remark: This is the very beginning of the source buffer. func foo() { ^ Included from :1: warning: This is the range of source text in which a string literal appears. print("Hello!") ^~~~~~~~ Included from :1: note: This is a fix-it that suggests an alternative string. print("Hello!") ^~~~~~~~ "Good-bye!"
Since our source buffers are not named, llvm::SourceMgr
just outputs the line on which the "include" occurs – in our case, the first line in the buffer, 1
– and so the output is "Included from :1:".
Assigning identifers to the llvm::MemoryBuffer
improves the output:
libLLVMSupport-Example.cpp
4 int main() { 5 llvm::StringRef IncludeInput = "import foo"; 6 llvm::SourceMgr SourceManager; 7 SourceManager.AddNewSourceBuffer( 8 std::move(- llvm::MemoryBuffer::getMemBuffer(IncludeInput)),+ llvm::MemoryBuffer::getMemBuffer(IncludeInput,,++ "include.swift")),11 /*IncludeLoc*/ llvm::SMLoc()); .. 27 std::unique_ptr<llvm::MemoryBuffer> InputBuffer =-- llvm::MemoryBuffer::getMemBuffer(Input);++ llvm::MemoryBuffer::getMemBuffer(Input, "foo.swift");... 107 }
Now the output includes the file names:
Included from include.swift:1: foo.swift:1:1: remark: This is the very beginning of the source buffer. func foo() { ^ Included from include.swift:1: foo.swift:2:9: warning: This is the range of source text in which a string literal appears. print("Hello!") ^~~~~~~~ Included from include.swift:1: foo.swift:2:9: note: This is a fix-it that suggests an alternative string. print("Hello!") ^~~~~~~~ "Good-bye!"
The diagnostics are pretty much identical to Swift's (and Clang's, the C/C++/Objective-C compiler), because the apple/swift codebase uses the same llvm::MemoryBuffer
I used above in order to store source text in memory, and it uses the same llvm::SourceMgr
and llvm::SMLoc
to emit its diagnostics.
How exactly do llvm::SourceMgr
and llvm::SMLoc
work?
llvm::SourceMgr
owns a vector of memory buffers – the ones added via calls to llvm::SourceMgr::AddNewSourceBuffer
. The SourceMgr.h
header file also includes:
- A declaration of the
llvm::SourceMgr::DiagKind
enum I used above to indicate the diagnostic severity. - The definition of a
SrcBuffer
struct used to wrap both thellvm::MemoryBuffer
and itsllvm::SMLoc
include location. - The implementation of the
llvm::SourceMgr::AddNewSourceBuffer
member function. Note how simple it is: it just creates aSrcBuffer
to wrap thellvm::MemoryBuffer
and its include location, and pushes thatSrcBuffer
onto its vector of buffers. - Implementations of some incredibly simple functions:
llvm::SourceMgr::isValidBufferID
, for example, simply checks whether the given buffer index is out of bounds of thellvm::SourceMgr::Buffers
vector.llvm::SourceMgr::getMainFileID
, in particular, makes me chuckle: it just returns1
! (When I first read the apple/swift source code and saw references to a "main buffer ID", I thought it was something more significant than just the first source buffer added to thellvm::SourceMgr
.)
llvm::SourceMgr
also stores a vector of "include directories," but I'm not quite sure why yet. As far as I can tell, it's only used by one or two internal LLVM tools, like TableGen (rememberllvm-tblgen
? I wrote about it in this article).
llvm/include/llvm/Support/SourceMgr.h
40 /// This owns the files read by a parser, handles include stacks, 41 /// and handles diagnostic wrangling. 42 class SourceMgr { 43 public: 44 enum DiagKind { 45 DK_Error, 46 DK_Warning, 47 DK_Remark, 48 DK_Note, 49 }; .. 56 private: 57 struct SrcBuffer { 58 /// The memory buffer for the file. 59 std::unique_ptr<MemoryBuffer> Buffer; .. 85 /// This is the location of the parent include, or null if at the top level. 86 SMLoc IncludeLoc; 87 88 SrcBuffer() = default; .. 93 }; 94 95 /// This is all of the buffers that we are reading from. 96 std::vector<SrcBuffer> Buffers; 97 98 // This is the list of directories we should search for include files in. 99 std::vector<std::string> IncludeDirectories; ... 104 bool isValidBufferID(unsigned i) const { return i && i <= Buffers.size(); } 105 106 public: 107 SourceMgr() = default; ... 131 const MemoryBuffer *getMemoryBuffer(unsigned i) const { 132 assert(isValidBufferID(i)); 133 return Buffers[i - 1].Buffer.get(); 134 } 135 136 unsigned getNumBuffers() const { 137 return Buffers.size(); 138 } 139 140 unsigned getMainFileID() const { 141 assert(getNumBuffers()); 142 return 1; 143 } ... 150 /// Add a new source buffer to this source manager. This takes ownership of 151 /// the memory buffer. 152 unsigned AddNewSourceBuffer(std::unique_ptr<MemoryBuffer> F, 153 SMLoc IncludeLoc) { 154 SrcBuffer NB; 155 NB.Buffer = std::move(F); 156 NB.IncludeLoc = IncludeLoc; 157 Buffers.push_back(std::move(NB)); 158 return Buffers.size(); 159 } ... 225 };
The llvm::SourceMgr
header file also declares the llvm::SourceMgr::PrintMessage
member function, which I used above to print diagnostics. This is a convenience function that creates an llvm::SMDiagnostic
instance, and then calls the llvm::SMDiagnostic::print
function.
llvm/include/llvm/Support/SourceMgr.h
42 class SourceMgr { ... 186 /// Emit a message about the specified location with the specified string. 187 /// 188 /// \param ShowColors Display colored messages if output is a terminal and 189 /// the default error handler is used. 190 void PrintMessage(raw_ostream &OS, SMLoc Loc, DiagKind Kind, 191 const Twine &Msg, 192 ArrayRef<SMRange> Ranges = None, 193 ArrayRef<SMFixIt> FixIts = None, 194 bool ShowColors = true) const; 195 196 /// Emits a diagnostic to llvm::errs(). 197 void PrintMessage(SMLoc Loc, DiagKind Kind, const Twine &Msg, 198 ArrayRef<SMRange> Ranges = None, 199 ArrayRef<SMFixIt> FixIts = None, 200 bool ShowColors = true) const; 201 202 /// Emits a manually-constructed diagnostic to the given output stream. 203 /// 204 /// \param ShowColors Display colored messages if output is a terminal and 205 /// the default error handler is used. 206 void PrintMessage(raw_ostream &OS, const SMDiagnostic &Diagnostic, 207 bool ShowColors = true) const; ... 225 };
llvm/lib/Support/SourceMgr.cpp
231 void SourceMgr::PrintMessage(raw_ostream &OS, const SMDiagnostic &Diagnostic, 232 bool ShowColors) const { ... 239 if (Diagnostic.getLoc().isValid()) { 240 unsigned CurBuf = FindBufferContainingLoc(Diagnostic.getLoc()); 241 assert(CurBuf && "Invalid or unspecified location!"); 242 PrintIncludeStack(getBufferInfo(CurBuf).IncludeLoc, OS); 243 } 244 245 Diagnostic.print(nullptr, OS, ShowColors); 246 } 247 248 void SourceMgr::PrintMessage(raw_ostream &OS, SMLoc Loc, 249 SourceMgr::DiagKind Kind, 250 const Twine &Msg, ArrayRef<SMRange> Ranges, 251 ArrayRef<SMFixIt> FixIts, bool ShowColors) const { 252 PrintMessage(OS, GetMessage(Loc, Kind, Msg, Ranges, FixIts), ShowColors); 253 } 254 255 void SourceMgr::PrintMessage(SMLoc Loc, SourceMgr::DiagKind Kind, 256 const Twine &Msg, ArrayRef<SMRange> Ranges, 257 ArrayRef<SMFixIt> FixIts, bool ShowColors) const { 258 PrintMessage(errs(), Loc, Kind, Msg, Ranges, FixIts, ShowColors); 259 }
The llvm::SMDiagnostic
class is also declared in the llvm::SourceMgr
header file. It simply stores the data necessary when printing a diagnostic, and it declares the llvm::SMDiagnostic::print
member function. This header also declares llvm::SMFixIt
, which is also a simple bag of data:
llvm/include/llvm/Support/SourceMgr.h
227 /// Represents a single fixit, a replacement of one range of text with another. 228 class SMFixIt { 229 SMRange Range; 230 231 std::string Text; ... 256 }; 257 258 /// Instances of this class encapsulate one diagnostic report, allowing 259 /// printing to a raw_ostream as a caret diagnostic. 260 class SMDiagnostic { 261 const SourceMgr *SM = nullptr; 262 SMLoc Loc; 263 std::string Filename; 264 int LineNo = 0; 265 int ColumnNo = 0; 266 SourceMgr::DiagKind Kind = SourceMgr::DK_Error; 267 std::string Message, LineContents; 268 std::vector<std::pair<unsigned, unsigned>> Ranges; 269 SmallVector<SMFixIt, 4> FixIts; 270 271 public: ... 278 // Diagnostic with a location. 279 SMDiagnostic(const SourceMgr &sm, SMLoc L, StringRef FN, 280 int Line, int Col, SourceMgr::DiagKind Kind, 281 StringRef Msg, StringRef LineStr, 282 ArrayRef<std::pair<unsigned,unsigned>> Ranges, 283 ArrayRef<SMFixIt> FixIts = None); ... 303 void print(const char *ProgName, raw_ostream &S, bool ShowColors = true, 304 bool ShowKindLabel = true) const; 305 };
Note that llvm::SMDiagnostic
is initialized with not just the std::string
message that will be printed, but also the std::string
source line to print as well. The llvm::SourceMgr::GetMessage
member function, which is called when llvm::SourceMgr::PrintMessage
is used to emit a diagnostic, grabs the source line, called LineStr
below, by searching for newline characters in the source buffer.
llvm/lib/Support/SourceMgr.cpp
169 SMDiagnostic SourceMgr::GetMessage(SMLoc Loc, SourceMgr::DiagKind Kind, 170 const Twine &Msg, 171 ArrayRef<SMRange> Ranges, 172 ArrayRef<SMFixIt> FixIts) const { ... 187 // Scan backward to find the start of the line. 188 const char *LineStart = Loc.getPointer(); 189 const char *BufStart = CurMB->getBufferStart(); 190 while (LineStart != BufStart && LineStart[-1] != '\n' && 191 LineStart[-1] != '\r') 192 --LineStart; 193 194 // Get the end of the line. 195 const char *LineEnd = Loc.getPointer(); 196 const char *BufEnd = CurMB->getBufferEnd(); 197 while (LineEnd != BufEnd && LineEnd[0] != '\n' && LineEnd[0] != '\r') 198 ++LineEnd; 199 LineStr = std::string(LineStart, LineEnd); ... 226 return SMDiagnostic(*this, Loc, BufferID, LineAndCol.first, 227 LineAndCol.second-1, Kind, Msg.str(), 228 LineStr, ColRanges, FixIts); 229 }
The implementation of the llvm::SMDiagnostic::print
function is what actually prints textual output (by default to llvm::errs
, which is just std::cerr
, also known as STDERR
). It's a fairly simple function, composed mostly of if
and print statements:
llvm/lib/Support/SourceMgr.cpp
368 void SMDiagnostic::print(const char *ProgName, raw_ostream &S, bool ShowColors, 369 bool ShowKindLabel) const { ... 376 if (ProgName && ProgName[0]) 377 S << ProgName << ": "; 378 379 if (!Filename.empty()) { 380 if (Filename == "-") 381 S << "<stdin>"; 382 else 383 S << Filename; 384 385 if (LineNo != -1) { 386 S << ':' << LineNo; 387 if (ColumnNo != -1) 388 S << ':' << (ColumnNo+1); 389 } 390 S << ": "; 391 } 392 393 if (ShowKindLabel) { 394 switch (Kind) { 395 case SourceMgr::DK_Error: ... 398 S << "error: "; 399 break; 400 case SourceMgr::DK_Warning: ... 403 S << "warning: "; 404 break; 405 case SourceMgr::DK_Note: ... 408 S << "note: "; 409 break; 410 case SourceMgr::DK_Remark: ... 413 S << "remark: "; 414 break; 415 } ... 421 } 422 423 S << Message << '\n'; ... 442 // Build the line with the caret and ranges. 443 std::string CaretLine(NumColumns+1, ' '); 444 445 // Expand any ranges. 446 for (unsigned r = 0, e = Ranges.size(); r != e; ++r) { 447 std::pair<unsigned, unsigned> R = Ranges[r]; 448 std::fill(&CaretLine[R.first], 449 &CaretLine[std::min((size_t)R.second, CaretLine.size())], 450 '~'); 451 } 452 453 // Add any fix-its. ... 455 std::string FixItInsertionLine; 456 buildFixItLine(CaretLine, FixItInsertionLine, FixIts, 457 makeArrayRef(Loc.getPointer() - ColumnNo, 458 LineContents.size())); 459 460 // Finally, plop on the caret. 461 if (unsigned(ColumnNo) <= NumColumns) 462 CaretLine[ColumnNo] = '^'; 463 else 464 CaretLine[NumColumns] = '^'; 465 466 // ... and remove trailing whitespace so the output doesn't wrap for it. We 467 // know that the line isn't completely empty because it has the caret in it at 468 // least. 469 CaretLine.erase(CaretLine.find_last_not_of(' ')+1); 470 471 printSourceLine(S, LineContents); ... 519 S << '\n'; 520 }
What I wrote about above represents pretty much everything libLLVMSupport provides in the way of emitting diagnostics. I enjoy reading code like this in libLLVMSupport because it demystifies things that I would otherwise take for granted. The functionality in llvm::SourceMgr
and llvm::SMDiagnostic
really boils down to storing strings in memory and printing to the console, but for some reason as a compiler user I had always assumed there was something more magical going on behind the scenes. Maybe you felt similarly after reading this article.
I did, however, leave out some details, such as the internals of
llvm::MemoryBuffer
, and some of the complex caching logic that speeds up functions likellvm::SourceMgr::getLineAndColumn
. The former is especially interesting and I hope to write about it more someday – take a look yourself if you have the chance, by reading throughllvm/include/llvm/Support/MemoryBuffer.h
andllvm/lib/Support/MemoryBuffer.cpp
.
In the next article, I'll explain more about how the apple/swift codebase wraps llvm::SourceMgr
with the swift::SourceManager
class, how swift::CompilerInstance
instantiates a llvm::MemoryBuffer
for each source file, and about other Swift-specific abstractions, such as swift::DiagnosticEngine
.