Option Parsing in the Swift Compiler
The last article in this series explained how libswiftDriver
split up swift
executable invocations into smaller sub-jobs. Parsing command-line arguments is a big part of that work, but I didn't go into much detail about it. This article will now explain command-line argument parsing in depth.
Specifically, this article describes how:
- Within the Swift build system, LLVM TableGen is used to transform the options specified in
swift/include/swift/Option/Options.td
. The transformed output is written to a file namedOptions.inc
. - A
libswiftOption
header includes theOptions.inc
file in order to define an enum namedswift::options::ID
. By defining a macro before includingOptions.inc
file, it's able to define an enum case for each option defined in the originalOptions.td
file:swift::options::ID::OPT_driver_print_jobs
,swift::options::ID::OPT_driver_print_actions
, and so on. - In its implementation,
libswiftOption
defines a macro, then includes theOptions.inc
file a second time, this time in order to initialize anllvm::opt::OptTable
. TheOptTable
class is defined inlibLLVMOption
and provides argument parsing utilities. - In its
swift::Driver::buildCompilation
method,libswiftDriver
calls thellvm::opt::OptTable::ParseArgs
method. This takes the array of strings passed into theswift
compiler executable'smain
function as an argument, and it returns anllvm::opt::InputArgList
. This class defines the methods used throughout the Swift compiler codebase. TheInputArgList::hasArg
method, which checks for the presence of an argument, is perhaps the most common.
Many contributions to the Swift compiler involve modifying or adding command-line options. Understanding how these options are parsed has helped me make such contributions.
An introduction to LLVM TableGen
TableGen is a utility program for LLVM developers. An llvm-tblgen
executable is built as part of LLVM, and can be found in the LLVM build directory of a Swift build tree, in /path/to/build/llvm-macosx-x86_64/bin/llvm-tblgen
.
TableGen is documented in detail here, but for my purposes it's sufficient to understand it as a tool that transforms the syntax in Swift's option files – swift/include/swift/Option/Options.td
, swift/include/swift/Option/FrontendOptions.td
, and swift/tools/SourceKit/tools/sourcekitd-test/Options.td
– into a syntax that looks like C macro invocations.
For example, the -driver-print-jobs
compiler option, mentioned in several past articles, is defined in Options.td
like so:
swift/include/swift/Option/Options.td
18 include "llvm/Option/OptParser.td" .. 82 def driver_print_jobs : Flag<["-"], "driver-print-jobs">, InternalDebugOpt, 83 HelpText<"Dump list of jobs to execute">;
I can use llvm-tblgen
to transform the file in which it's defined:
/path/to/build/llvm-macosx-x86_64/bin/llvm-tblgen \ -I ~/local/Source/apple/llvm/include \ -I ~/local/Source/apple/swift/include/swift/Option \ ~/local/Source/apple/swift/include/swift/Option/Options.td \ -gen-opt-parser-defs
This takes all of the options in Options.td
, and outputs them as calls to a C macro named OPTION
. For example, here's the output that corresponds to the -driver-print-jobs
definition shown above:
OPTION( prefix_1, "driver-print-jobs", driver_print_jobs, Flag, internal_debug_Group, INVALID, nullptr, HelpHidden | DoesNotAffectIncrementalBuild, 0, "Dump list of jobs to execute", nullptr, nullptr)
The LLVM TableGen executable
llvm-tblgen
can output many different formats. I instructed it to output calls to theOPTION
macro, by using the-gen-opt-parser-defs
argument. Other arguments include-gen-ctags
, which generates definitions and source locations for the popular source indexing utilityctags
, and-print-records
, which prints TableGen's internal representation of each entry.
With that prerequisite explanation of LLVM TableGen out of the way, we're ready to look at the four stages of how arguments are parsed in the Swift compiler.
Stage 1: Swift's CMake instructs TableGen to transform Options.td
into Options.inc
If you haven't already, try reading The Swift Compiler's Build System and Reading and Understanding the CMake in apple/swift before continuing – you'll need to know the basics of CMake in order to enjoy this section.
A git grep
for "driver-print-jobs"
reveals that this option is defined in swift/include/swift/Option/Options.td
. In fact, nearly every option supported by the Swift compiler is defined in this file. And in that same directory, a CMakeLists.txt
file defines how these options are transformed by TableGen:
swift/include/swift/Option/CMakeLists.txt
1 set(LLVM_TARGET_DEFINITIONS Options.td) 2 swift_tablegen(Options.inc -gen-opt-parser-defs) 3 swift_add_public_tablegen_target(SwiftOptions)
Note that, unlike CMake functions that take their input files as arguments, the swift_tablegen
CMake macro requires its input files be specified using a global variable named LLVM_TARGET_DEFINITIONS
. The function also allows arbitrary options to be passed to the llvm-tblgen
executable. In this case, that's -gen-opt-parser-defs
– the option that transforms TableGen definitions into OPTION
macro calls.
swift_tablgen
and swift_add_public_tablegen_target
are CMake macros defined in swift/cmake/modules/AddSwiftTableGen.cmake
. These two macros call through to LLVM CMake functions tablegen
and add_public_tablegen_target
:
- The LLVM CMake function
tablegen
instructs CMake on how to produce the specified output file (Options.inc
, in this case). - The LLVM CMake function
add_public_tablegen_target
creates a public CMake target that depends on the production of theOptions.inc
file.
Here's the tablegen
LLVM CMake function. It uses the built-in CMake function add_custom_command
to define how llvm-tblgen
should be run:
llvm/cmake/modules/TableGen.cmake
11 function(tablegen project ofn) .. 43 if (IS_ABSOLUTE ${LLVM_TARGET_DEFINITIONS}) 44 set(LLVM_TARGET_DEFINITIONS_ABSOLUTE ${LLVM_TARGET_DEFINITIONS}) 45 else() 46 set(LLVM_TARGET_DEFINITIONS_ABSOLUTE 47 ${CMAKE_CURRENT_SOURCE_DIR}/${LLVM_TARGET_DEFINITIONS}) 48 endif() .. 65 add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${ofn}.tmp .. 67 COMMAND ${${project}_TABLEGEN_EXE} ${ARGN} -I ${CMAKE_CURRENT_SOURCE_DIR} 68 ${LLVM_TABLEGEN_FLAGS} 69 ${LLVM_TARGET_DEFINITIONS_ABSOLUTE} .. 77 COMMENT "Building ${ofn}..." 78 ) 79 add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${ofn} 80 # Only update the real output file if there are any differences. 81 # This prevents recompilation of all the files depending on it if there 82 # aren't any. 83 COMMAND ${CMAKE_COMMAND} -E copy_if_different 84 ${CMAKE_CURRENT_BINARY_DIR}/${ofn}.tmp 85 ${CMAKE_CURRENT_BINARY_DIR}/${ofn} 86 DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${ofn}.tmp 87 COMMENT "Updating ${ofn}..." 88 ) .. 93 94 set(TABLEGEN_OUTPUT ${TABLEGEN_OUTPUT} ${CMAKE_CURRENT_BINARY_DIR}/${ofn} PARENT_SCOPE) .. 97 endfunction()
And here's the add_public_tablegen_target
LLVM CMake function:
llvm/cmake/modules/TableGen.cmake
100 function(add_public_tablegen_target target) ... 104 add_custom_target(${target} 105 DEPENDS ${TABLEGEN_OUTPUT}) ... 111 endfunction()
The add_public_tablegen_target
calls the built-in CMake function add_custom_target
in order to define a CMake target. In this case, the name of that target is SwiftOptions
. I can manually generate the TableGen for SwiftOptions
by invoking cmake --build
on the command line, specifying SwiftOptions
as the target I'd like to build:
cmake --build \ /path/to/build/swift-macosx-x86_64 \ --target SwiftOptions
The above command does essentially the same thing as the manual llvm-tblgen -gen-opt-parser-defs
invocation shown earlier in this article.
Instead of having users manually build the SwiftOptions
target, the CMake targets for several Swift compiler libraries, such as libswiftOption
and libswiftDriver
, specify a dependency upon SwiftOptions
. For example, here's libswiftOption
:
swift/lib/Option/CMakeLists.txt
1 add_swift_library(swiftOption STATIC . 4 DEPENDS SwiftOptions
As a result of this declared dependency, building libswiftOption
results in SwiftOptions
being built first, which means llvm-tblgen
is run on swift/include/swift/Option/Options.td
in order to produce the file /path/to/build/swift-macosx-x86_64/include/swift/Option/Options.inc
. And again, because the swift/include/swift/Option/CMakeLists.txt
file specifies that llvm-tblgen
be invoked with the -gen-opt-parser-defs
argument, the Options.inc
file is populated with one call to an OPTION
macro for each option defined in the original Options.td
file.
Stage 2: The libswiftOption
headers declare an enum containing a case for each of the options
In stage one, llvm-tblgen
transformed the Options.td
file into a file named Options.inc
. For each option defined in the original Options.td
, the new Options.inc
file contains an invocation of a macro named OPTION
:
OPTION( prefix_1, "driver-print-jobs", driver_print_jobs, Flag, internal_debug_Group, INVALID, nullptr, HelpHidden | DoesNotAffectIncrementalBuild, 0, "Dump list of jobs to execute", nullptr, nullptr)
This is useful because I can define the OPTION
macro to do whatever I want, then #include "Options.inc"
in order to have the macro invoked once for each option.
This is exactly what the libswiftOption
header Options.h
does in order to define the swift::options::ID
enum. That enum contains a case for each option defined in Options.td
. Without TableGen, that would mean manually listing each of these options out in the enum, like this:
namespace swift { namespace options { enum ID { OPT_driver_print_jobs, OPT_driver_print_actions, OPT_driver_skip_execution, OPT_driver_use_frontend_path, // ...182 more options. };
Every time a Swift compiler developer wanted to add or remove a Swift option, they'd have to manually add or remove it from that enum. That's not only tedious, it's also error-prone.
Instead, the swift/include/Option/Options.h
header defines the OPTION
macro such that it concatenates the tokens OPT_<option-identifier>
for each definition in Options.inc
:
swift/include/swift/Option/Options.h
24 namespace swift { 25 namespace options { .. 39 enum ID { 40 OPT_INVALID = 0, 41 #define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM, \ 42 HELPTEXT, METAVAR, VALUES) \ 43 OPT_##ID, 44 #include "swift/Option/Options.inc" 45 LastOption 46 #undef OPTION 47 };
This results in an enum with several hundred cases, one for each option defined in Options.inc
. Note that the macro definition of OPTION
above only uses the third argument passed into it, which it calls ID
. For -driver-print-jobs
, that third parameter is driver_print_jobs
:
OPTION( prefix_1, "driver-print-jobs",driver_print_jobs, Flag, internal_debug_Group, INVALID, nullptr, HelpHidden | DoesNotAffectIncrementalBuild, 0, "Dump list of jobs to execute", nullptr, nullptr)
Stage 3: libswiftOption
instantiates an llvm::opt::OptTable
, with an llvm::opt::OptTable::Info
element for each of the options
LLVM provides a library, named libLLVMOption
, that encapsulates common operations related to command-line argument parsing. For instance, libLLVMOption
defines the llvm::opt::OptTable
and llvm::opt::InputArgList
classes:
- The
OptTable
class is instantiated with a list of options to parse, and provides a method namedParseArgs
. This method takes the list of strings passed into an executable, compares them to the options theOptTable
is supposed to parse, and returns anInputArgList
(as well as a list of arguments that could not be parsed). InputArgList
defines a method namedhasArg
. This method can be used to check whether an option was specified.
libswiftOption
is responsible for instantiating an llvm::opt::OptTable
that's used to parse arguments passed to the swift
executable. The llvm::opt::OptTable
initializer takes an array of options it's supposed to parse. These options are represented using the llvm::opt::OptTable::Info
struct:
llvm/include/llvm/Option/OptTable.h
39 class OptTable { .. 42 struct Info { .. 45 const char *const *Prefixes; 46 const char *Name; 47 const char *HelpText; 48 const char *MetaVar; 49 unsigned ID; 50 unsigned char Kind; 51 unsigned char Param; 52 unsigned short Flags; 53 unsigned short GroupID; 54 unsigned short AliasID; 55 const char *AliasArgs; 56 const char *Values; 57 }; .. 83 protected: 84 OptTable(ArrayRef<Info> OptionInfos, bool IgnoreCase = false);
In order to create this array of options, libswiftOption
once again defines the OPTION
macro and includes the Options.inc
file. This time, it defines the OPTION
macro such that it list-initializes an OptTable::Info
struct for each call, and it uses those to statically define an array named InfoTable[]
:
swift/lib/Option/Options.cpp
13 #include "swift/Option/Options.h" .. 16 #include "llvm/Option/OptTable.h" 17 #include "llvm/Option/Option.h" 18 19 using namespace swift::options; 20 using namespace llvm::opt; .. 26 static const OptTable::Info InfoTable[] = { 27 #define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM, \ 28 HELPTEXT, METAVAR, VALUES) \ 29 {PREFIX, NAME, HELPTEXT, METAVAR, OPT_##ID, Option::KIND##Class, \ 30 PARAM, FLAGS, OPT_##GROUP, OPT_##ALIAS, ALIASARGS, VALUES}, 31 #include "swift/Option/Options.inc" 32 #undef OPTION 33 };
After pre-processing, the OPTION
macro calls are expanded such that the InfoTable[]
initializer above looks like this:
static const OptTable::Info InfoTable[] = { {{"-", nullptr}, "driver-print-jobs", "Dump list of jobs to execute", nullptr, OPT_driver_print_jobs, Option::FlagClass, 0, HelpHidden | DoesNotAffectIncrementalBuild, OPT_internal_debug_Group, OPT_INVALID, nullptr, nullptr}, {{"-", nullptr}, "driver-print-actions", "Dump list of actions to perform", nullptr, OPT_driver_print_actions, Option::FlagClass, 0, HelpHidden | DoesNotAffectIncrementalBuild, OPT_internal_debug_Group, OPT_INVALID, nullptr, nullptr}, // ...184 more options. };
Note that the enum values from stage two are used in these static option definitions:
OPT_driver_print_jobs
,OPT_driver_print_actions
, and so on.
To allow other parts of the Swift compiler to have access to this table of options, libswiftOption
creates an llvm::opt::OptTable
subclass named SwiftOptTable
, and initializes it with the InfoTable[]
array from above:
swift/lib/Option/Options.cpp
37 class SwiftOptTable : public OptTable { 38 public: 39 SwiftOptTable() : OptTable(InfoTable) {} 40 };
It also defines a function that allows other libraries to grab a reference to the option table:
swift/lib/Option/Options.cpp
44 std::unique_ptr<OptTable> swift::createSwiftOptTable() { 45 return std::unique_ptr<OptTable>(new SwiftOptTable()); 46 }
Stage 4: libswiftDriver
calls the llvm::opt::OptTable::ParseArgs
method
As I briefly mentioned in the previous article in this series, the swift::Driver
initializer calls the createSwiftOptTable
function in order to get a reference to the llvm::opt::OptTable
subclass SwiftOptTable
, storing it in an ivar named Driver::Opts
:
swift/lib/Driver/Driver.cpp
64 Driver::Driver(StringRef DriverExecutable, 65 StringRef Name, 66 ArrayRef<const char *> Args, 67 DiagnosticEngine &Diags) 68 : Opts(createSwiftOptTable()), Diags(Diags),
Then, the Driver
has the command-line arguments to the swift
executable parsed, in order to create and configure an instance of swift::Compilation
. To parse the arguments, it calls the llvm::opt::OptTable::ParseArgs
method:
swift/lib/Driver/Driver.cpp
841 std::unique_ptr<InputArgList> 842 Driver::parseArgStrings(ArrayRef<const char *> Args) { ... 854 ArgList = llvm::make_unique<InputArgList>( 855 getOpts().ParseArgs(Args, MissingArgIndex, MissingArgCount, 856 IncludedFlagsBitmask, ExcludedFlagsBitmask)); ... 888 return ArgList; 889 }
The InputArgList
returned by the Driver::parseArgStrings
method is used throughout libswiftDriver
. For example, the previous article in this series showcased the following code, which produced a warning if the -incremental
and -whole-module-optimization
arguments were used in the same invocation:
swift/lib/Driver/Driver.cpp
529 bool Incremental = ArgList->hasArg(options::OPT_incremental); 530 if (ArgList->hasArg(options::OPT_whole_module_optimization)) { 531 if (Incremental && ShowIncrementalBuildDecisions) { 532 llvm::outs() << "Incremental compilation has been disabled, because it " 533 << "is not compatible with whole module optimization."; 534 } 535 Incremental = false; 536 }
The code above uses the llvm::opt::InputArgList::hasArg
method to check for the OPT_incremental
and OPT_whole_module_optimization
arguments. Remember, these were cases that were added to the swift::options::ID
enum by including Options.inc
in stage two above! It all comes toghether here.
Besides checking for the existence of certain options, InputArgList
also has methods to grab the values specified by those methods. Here's how libswiftFrontend
grabs the value from the popular -warn-long-function-bodies=
command-line option, by using the llvm::opt::InputArgList::getLastArg
and llvm::opt::Arg::getValue
methods:
swift/lib/Frontend/CompilerInvocation.cpp
178 if (const Arg *A = Args.getLastArg(OPT_warn_long_function_bodies)) { 179 unsigned attempt; 180 if (StringRef(A->getValue()).getAsInteger(10, attempt)) { 181 Diags.diagnose(SourceLoc(), diag::error_invalid_arg_value, 182 A->getAsString(Args), A->getValue()); 183 } else { 184 Opts.WarnLongFunctionBodies = attempt; 185 } 186 }
Sorry, no magic here
I enjoy looking into the details of the Swift compiler because – and maybe this sounds silly – it helps me better understand that it's "just a program".
Because I was unfamilar with the LLVM TableGen utility, and with the C/C++ macros that the Swift compiler uses to define its options, it seemed like magic to me that modifying the Options.td
file would result in changes to Swift's command-line options. But it's not magic – as this article described, it's a four-stage process in which:
- The
Options.td
file is transformed by TableGen. - The transformed file,
Options.inc
, is included such that it defines a large enum with all the Swift options as values. - The transformed file is included again, this time to intiialize an LLVM
OptTable
. This class is capable of searching command-line arguments for option values. - The rest of the Swift compiler codebase uses the LLVM
OptTable
class to check for arguments as necessary.