Option Parsing in the Swift Compiler
The last article in this series explained how libswiftDriver split up swift executable invocations into smaller sub-jobs. Parsing command-line arguments is a big part of that work, but I didn't go into much detail about it. This article will now explain command-line argument parsing in depth.
Specifically, this article describes how:
- Within the Swift build system, LLVM TableGen is used to transform the options specified in
swift/include/swift/Option/Options.td. The transformed output is written to a file namedOptions.inc. - A
libswiftOptionheader includes theOptions.incfile in order to define an enum namedswift::options::ID. By defining a macro before includingOptions.incfile, it's able to define an enum case for each option defined in the originalOptions.tdfile:swift::options::ID::OPT_driver_print_jobs,swift::options::ID::OPT_driver_print_actions, and so on. - In its implementation,
libswiftOptiondefines a macro, then includes theOptions.incfile a second time, this time in order to initialize anllvm::opt::OptTable. TheOptTableclass is defined inlibLLVMOptionand provides argument parsing utilities. - In its
swift::Driver::buildCompilationmethod,libswiftDrivercalls thellvm::opt::OptTable::ParseArgsmethod. This takes the array of strings passed into theswiftcompiler executable'smainfunction as an argument, and it returns anllvm::opt::InputArgList. This class defines the methods used throughout the Swift compiler codebase. TheInputArgList::hasArgmethod, which checks for the presence of an argument, is perhaps the most common.
Many contributions to the Swift compiler involve modifying or adding command-line options. Understanding how these options are parsed has helped me make such contributions.
An introduction to LLVM TableGen
TableGen is a utility program for LLVM developers. An llvm-tblgen executable is built as part of LLVM, and can be found in the LLVM build directory of a Swift build tree, in /path/to/build/llvm-macosx-x86_64/bin/llvm-tblgen.
TableGen is documented in detail here, but for my purposes it's sufficient to understand it as a tool that transforms the syntax in Swift's option files – swift/include/swift/Option/Options.td, swift/include/swift/Option/FrontendOptions.td, and swift/tools/SourceKit/tools/sourcekitd-test/Options.td – into a syntax that looks like C macro invocations.
For example, the -driver-print-jobs compiler option, mentioned in several past articles, is defined in Options.td like so:
swift/include/swift/Option/Options.td
18 include "llvm/Option/OptParser.td" .. 82 def driver_print_jobs : Flag<["-"], "driver-print-jobs">, InternalDebugOpt, 83 HelpText<"Dump list of jobs to execute">;
I can use llvm-tblgen to transform the file in which it's defined:
/path/to/build/llvm-macosx-x86_64/bin/llvm-tblgen \ -I ~/local/Source/apple/llvm/include \ -I ~/local/Source/apple/swift/include/swift/Option \ ~/local/Source/apple/swift/include/swift/Option/Options.td \ -gen-opt-parser-defs
This takes all of the options in Options.td, and outputs them as calls to a C macro named OPTION. For example, here's the output that corresponds to the -driver-print-jobs definition shown above:
OPTION( prefix_1, "driver-print-jobs", driver_print_jobs, Flag, internal_debug_Group, INVALID, nullptr, HelpHidden | DoesNotAffectIncrementalBuild, 0, "Dump list of jobs to execute", nullptr, nullptr)
The LLVM TableGen executable
llvm-tblgencan output many different formats. I instructed it to output calls to theOPTIONmacro, by using the-gen-opt-parser-defsargument. Other arguments include-gen-ctags, which generates definitions and source locations for the popular source indexing utilityctags, and-print-records, which prints TableGen's internal representation of each entry.
With that prerequisite explanation of LLVM TableGen out of the way, we're ready to look at the four stages of how arguments are parsed in the Swift compiler.
Stage 1: Swift's CMake instructs TableGen to transform Options.td into Options.inc
If you haven't already, try reading The Swift Compiler's Build System and Reading and Understanding the CMake in apple/swift before continuing – you'll need to know the basics of CMake in order to enjoy this section.
A git grep for "driver-print-jobs" reveals that this option is defined in swift/include/swift/Option/Options.td. In fact, nearly every option supported by the Swift compiler is defined in this file. And in that same directory, a CMakeLists.txt file defines how these options are transformed by TableGen:
swift/include/swift/Option/CMakeLists.txt
1 set(LLVM_TARGET_DEFINITIONS Options.td) 2 swift_tablegen(Options.inc -gen-opt-parser-defs) 3 swift_add_public_tablegen_target(SwiftOptions)
Note that, unlike CMake functions that take their input files as arguments, the swift_tablegen CMake macro requires its input files be specified using a global variable named LLVM_TARGET_DEFINITIONS. The function also allows arbitrary options to be passed to the llvm-tblgen executable. In this case, that's -gen-opt-parser-defs – the option that transforms TableGen definitions into OPTION macro calls.
swift_tablgen and swift_add_public_tablegen_target are CMake macros defined in swift/cmake/modules/AddSwiftTableGen.cmake. These two macros call through to LLVM CMake functions tablegen and add_public_tablegen_target:
- The LLVM CMake function
tablegeninstructs CMake on how to produce the specified output file (Options.inc, in this case). - The LLVM CMake function
add_public_tablegen_targetcreates a public CMake target that depends on the production of theOptions.incfile.
Here's the tablegen LLVM CMake function. It uses the built-in CMake function add_custom_command to define how llvm-tblgen should be run:
llvm/cmake/modules/TableGen.cmake
11 function(tablegen project ofn) .. 43 if (IS_ABSOLUTE ${LLVM_TARGET_DEFINITIONS}) 44 set(LLVM_TARGET_DEFINITIONS_ABSOLUTE ${LLVM_TARGET_DEFINITIONS}) 45 else() 46 set(LLVM_TARGET_DEFINITIONS_ABSOLUTE 47 ${CMAKE_CURRENT_SOURCE_DIR}/${LLVM_TARGET_DEFINITIONS}) 48 endif() .. 65 add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${ofn}.tmp .. 67 COMMAND ${${project}_TABLEGEN_EXE} ${ARGN} -I ${CMAKE_CURRENT_SOURCE_DIR} 68 ${LLVM_TABLEGEN_FLAGS} 69 ${LLVM_TARGET_DEFINITIONS_ABSOLUTE} .. 77 COMMENT "Building ${ofn}..." 78 ) 79 add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${ofn} 80 # Only update the real output file if there are any differences. 81 # This prevents recompilation of all the files depending on it if there 82 # aren't any. 83 COMMAND ${CMAKE_COMMAND} -E copy_if_different 84 ${CMAKE_CURRENT_BINARY_DIR}/${ofn}.tmp 85 ${CMAKE_CURRENT_BINARY_DIR}/${ofn} 86 DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${ofn}.tmp 87 COMMENT "Updating ${ofn}..." 88 ) .. 93 94 set(TABLEGEN_OUTPUT ${TABLEGEN_OUTPUT} ${CMAKE_CURRENT_BINARY_DIR}/${ofn} PARENT_SCOPE) .. 97 endfunction()
And here's the add_public_tablegen_target LLVM CMake function:
llvm/cmake/modules/TableGen.cmake
100 function(add_public_tablegen_target target) ... 104 add_custom_target(${target} 105 DEPENDS ${TABLEGEN_OUTPUT}) ... 111 endfunction()
The add_public_tablegen_target calls the built-in CMake function add_custom_target in order to define a CMake target. In this case, the name of that target is SwiftOptions. I can manually generate the TableGen for SwiftOptions by invoking cmake --build on the command line, specifying SwiftOptions as the target I'd like to build:
cmake --build \ /path/to/build/swift-macosx-x86_64 \ --target SwiftOptions
The above command does essentially the same thing as the manual llvm-tblgen -gen-opt-parser-defs invocation shown earlier in this article.
Instead of having users manually build the SwiftOptions target, the CMake targets for several Swift compiler libraries, such as libswiftOption and libswiftDriver, specify a dependency upon SwiftOptions. For example, here's libswiftOption:
swift/lib/Option/CMakeLists.txt
1 add_swift_library(swiftOption STATIC . 4 DEPENDS SwiftOptions
As a result of this declared dependency, building libswiftOption results in SwiftOptions being built first, which means llvm-tblgen is run on swift/include/swift/Option/Options.td in order to produce the file /path/to/build/swift-macosx-x86_64/include/swift/Option/Options.inc. And again, because the swift/include/swift/Option/CMakeLists.txt file specifies that llvm-tblgen be invoked with the -gen-opt-parser-defs argument, the Options.inc file is populated with one call to an OPTION macro for each option defined in the original Options.td file.
Stage 2: The libswiftOption headers declare an enum containing a case for each of the options
In stage one, llvm-tblgen transformed the Options.td file into a file named Options.inc. For each option defined in the original Options.td, the new Options.inc file contains an invocation of a macro named OPTION:
OPTION( prefix_1, "driver-print-jobs", driver_print_jobs, Flag, internal_debug_Group, INVALID, nullptr, HelpHidden | DoesNotAffectIncrementalBuild, 0, "Dump list of jobs to execute", nullptr, nullptr)
This is useful because I can define the OPTION macro to do whatever I want, then #include "Options.inc" in order to have the macro invoked once for each option.
This is exactly what the libswiftOption header Options.h does in order to define the swift::options::ID enum. That enum contains a case for each option defined in Options.td. Without TableGen, that would mean manually listing each of these options out in the enum, like this:
namespace swift { namespace options { enum ID { OPT_driver_print_jobs, OPT_driver_print_actions, OPT_driver_skip_execution, OPT_driver_use_frontend_path, // ...182 more options. };
Every time a Swift compiler developer wanted to add or remove a Swift option, they'd have to manually add or remove it from that enum. That's not only tedious, it's also error-prone.
Instead, the swift/include/Option/Options.h header defines the OPTION macro such that it concatenates the tokens OPT_<option-identifier> for each definition in Options.inc:
swift/include/swift/Option/Options.h
24 namespace swift { 25 namespace options { .. 39 enum ID { 40 OPT_INVALID = 0, 41 #define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM, \ 42 HELPTEXT, METAVAR, VALUES) \ 43 OPT_##ID, 44 #include "swift/Option/Options.inc" 45 LastOption 46 #undef OPTION 47 };
This results in an enum with several hundred cases, one for each option defined in Options.inc. Note that the macro definition of OPTION above only uses the third argument passed into it, which it calls ID. For -driver-print-jobs, that third parameter is driver_print_jobs:
OPTION( prefix_1, "driver-print-jobs",driver_print_jobs, Flag, internal_debug_Group, INVALID, nullptr, HelpHidden | DoesNotAffectIncrementalBuild, 0, "Dump list of jobs to execute", nullptr, nullptr)
Stage 3: libswiftOption instantiates an llvm::opt::OptTable, with an llvm::opt::OptTable::Info element for each of the options
LLVM provides a library, named libLLVMOption, that encapsulates common operations related to command-line argument parsing. For instance, libLLVMOption defines the llvm::opt::OptTable and llvm::opt::InputArgList classes:
- The
OptTableclass is instantiated with a list of options to parse, and provides a method namedParseArgs. This method takes the list of strings passed into an executable, compares them to the options theOptTableis supposed to parse, and returns anInputArgList(as well as a list of arguments that could not be parsed). InputArgListdefines a method namedhasArg. This method can be used to check whether an option was specified.
libswiftOption is responsible for instantiating an llvm::opt::OptTable that's used to parse arguments passed to the swift executable. The llvm::opt::OptTable initializer takes an array of options it's supposed to parse. These options are represented using the llvm::opt::OptTable::Info struct:
llvm/include/llvm/Option/OptTable.h
39 class OptTable { .. 42 struct Info { .. 45 const char *const *Prefixes; 46 const char *Name; 47 const char *HelpText; 48 const char *MetaVar; 49 unsigned ID; 50 unsigned char Kind; 51 unsigned char Param; 52 unsigned short Flags; 53 unsigned short GroupID; 54 unsigned short AliasID; 55 const char *AliasArgs; 56 const char *Values; 57 }; .. 83 protected: 84 OptTable(ArrayRef<Info> OptionInfos, bool IgnoreCase = false);
In order to create this array of options, libswiftOption once again defines the OPTION macro and includes the Options.inc file. This time, it defines the OPTION macro such that it list-initializes an OptTable::Info struct for each call, and it uses those to statically define an array named InfoTable[]:
swift/lib/Option/Options.cpp
13 #include "swift/Option/Options.h" .. 16 #include "llvm/Option/OptTable.h" 17 #include "llvm/Option/Option.h" 18 19 using namespace swift::options; 20 using namespace llvm::opt; .. 26 static const OptTable::Info InfoTable[] = { 27 #define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM, \ 28 HELPTEXT, METAVAR, VALUES) \ 29 {PREFIX, NAME, HELPTEXT, METAVAR, OPT_##ID, Option::KIND##Class, \ 30 PARAM, FLAGS, OPT_##GROUP, OPT_##ALIAS, ALIASARGS, VALUES}, 31 #include "swift/Option/Options.inc" 32 #undef OPTION 33 };
After pre-processing, the OPTION macro calls are expanded such that the InfoTable[] initializer above looks like this:
static const OptTable::Info InfoTable[] = { {{"-", nullptr}, "driver-print-jobs", "Dump list of jobs to execute", nullptr, OPT_driver_print_jobs, Option::FlagClass, 0, HelpHidden | DoesNotAffectIncrementalBuild, OPT_internal_debug_Group, OPT_INVALID, nullptr, nullptr}, {{"-", nullptr}, "driver-print-actions", "Dump list of actions to perform", nullptr, OPT_driver_print_actions, Option::FlagClass, 0, HelpHidden | DoesNotAffectIncrementalBuild, OPT_internal_debug_Group, OPT_INVALID, nullptr, nullptr}, // ...184 more options. };
Note that the enum values from stage two are used in these static option definitions:
OPT_driver_print_jobs,OPT_driver_print_actions, and so on.
To allow other parts of the Swift compiler to have access to this table of options, libswiftOption creates an llvm::opt::OptTable subclass named SwiftOptTable, and initializes it with the InfoTable[] array from above:
swift/lib/Option/Options.cpp
37 class SwiftOptTable : public OptTable { 38 public: 39 SwiftOptTable() : OptTable(InfoTable) {} 40 };
It also defines a function that allows other libraries to grab a reference to the option table:
swift/lib/Option/Options.cpp
44 std::unique_ptr<OptTable> swift::createSwiftOptTable() { 45 return std::unique_ptr<OptTable>(new SwiftOptTable()); 46 }
Stage 4: libswiftDriver calls the llvm::opt::OptTable::ParseArgs method
As I briefly mentioned in the previous article in this series, the swift::Driver initializer calls the createSwiftOptTable function in order to get a reference to the llvm::opt::OptTable subclass SwiftOptTable, storing it in an ivar named Driver::Opts:
swift/lib/Driver/Driver.cpp
64 Driver::Driver(StringRef DriverExecutable, 65 StringRef Name, 66 ArrayRef<const char *> Args, 67 DiagnosticEngine &Diags) 68 : Opts(createSwiftOptTable()), Diags(Diags),
Then, the Driver has the command-line arguments to the swift executable parsed, in order to create and configure an instance of swift::Compilation. To parse the arguments, it calls the llvm::opt::OptTable::ParseArgs method:
swift/lib/Driver/Driver.cpp
841 std::unique_ptr<InputArgList> 842 Driver::parseArgStrings(ArrayRef<const char *> Args) { ... 854 ArgList = llvm::make_unique<InputArgList>( 855 getOpts().ParseArgs(Args, MissingArgIndex, MissingArgCount, 856 IncludedFlagsBitmask, ExcludedFlagsBitmask)); ... 888 return ArgList; 889 }
The InputArgList returned by the Driver::parseArgStrings method is used throughout libswiftDriver. For example, the previous article in this series showcased the following code, which produced a warning if the -incremental and -whole-module-optimization arguments were used in the same invocation:
swift/lib/Driver/Driver.cpp
529 bool Incremental = ArgList->hasArg(options::OPT_incremental); 530 if (ArgList->hasArg(options::OPT_whole_module_optimization)) { 531 if (Incremental && ShowIncrementalBuildDecisions) { 532 llvm::outs() << "Incremental compilation has been disabled, because it " 533 << "is not compatible with whole module optimization."; 534 } 535 Incremental = false; 536 }
The code above uses the llvm::opt::InputArgList::hasArg method to check for the OPT_incremental and OPT_whole_module_optimization arguments. Remember, these were cases that were added to the swift::options::ID enum by including Options.inc in stage two above! It all comes toghether here.
Besides checking for the existence of certain options, InputArgList also has methods to grab the values specified by those methods. Here's how libswiftFrontend grabs the value from the popular -warn-long-function-bodies= command-line option, by using the llvm::opt::InputArgList::getLastArg and llvm::opt::Arg::getValue methods:
swift/lib/Frontend/CompilerInvocation.cpp
178 if (const Arg *A = Args.getLastArg(OPT_warn_long_function_bodies)) { 179 unsigned attempt; 180 if (StringRef(A->getValue()).getAsInteger(10, attempt)) { 181 Diags.diagnose(SourceLoc(), diag::error_invalid_arg_value, 182 A->getAsString(Args), A->getValue()); 183 } else { 184 Opts.WarnLongFunctionBodies = attempt; 185 } 186 }
Sorry, no magic here
I enjoy looking into the details of the Swift compiler because – and maybe this sounds silly – it helps me better understand that it's "just a program".
Because I was unfamilar with the LLVM TableGen utility, and with the C/C++ macros that the Swift compiler uses to define its options, it seemed like magic to me that modifying the Options.td file would result in changes to Swift's command-line options. But it's not magic – as this article described, it's a four-stage process in which:
- The
Options.tdfile is transformed by TableGen. - The transformed file,
Options.inc, is included such that it defines a large enum with all the Swift options as values. - The transformed file is included again, this time to intiialize an LLVM
OptTable. This class is capable of searching command-line arguments for option values. - The rest of the Swift compiler codebase uses the LLVM
OptTableclass to check for arguments as necessary.