Option Parsing in the Swift Compiler

The last article in this series explained how libswiftDriver split up swift executable invocations into smaller sub-jobs. Parsing command-line arguments is a big part of that work, but I didn't go into much detail about it. This article will now explain command-line argument parsing in depth.

Specifically, this article describes how:

  1. Within the Swift build system, LLVM TableGen is used to transform the options specified in swift/include/swift/Option/Options.td. The transformed output is written to a file named Options.inc.
  2. A libswiftOption header includes the Options.inc file in order to define an enum named swift::options::ID. By defining a macro before including Options.inc file, it's able to define an enum case for each option defined in the original Options.td file: swift::options::ID::OPT_driver_print_jobs, swift::options::ID::OPT_driver_print_actions, and so on.
  3. In its implementation, libswiftOption defines a macro, then includes the Options.inc file a second time, this time in order to initialize an llvm::opt::OptTable. The OptTable class is defined in libLLVMOption and provides argument parsing utilities.
  4. In its swift::Driver::buildCompilation method, libswiftDriver calls the llvm::opt::OptTable::ParseArgs method. This takes the array of strings passed into the swift compiler executable's main function as an argument, and it returns an llvm::opt::InputArgList. This class defines the methods used throughout the Swift compiler codebase. The InputArgList::hasArg method, which checks for the presence of an argument, is perhaps the most common.

Many contributions to the Swift compiler involve modifying or adding command-line options. Understanding how these options are parsed has helped me make such contributions.

An introduction to LLVM TableGen

TableGen is a utility program for LLVM developers. An llvm-tblgen executable is built as part of LLVM, and can be found in the LLVM build directory of a Swift build tree, in /path/to/build/llvm-macosx-x86_64/bin/llvm-tblgen.

TableGen is documented in detail here, but for my purposes it's sufficient to understand it as a tool that transforms the syntax in Swift's option files – swift/include/swift/Option/Options.td, swift/include/swift/Option/FrontendOptions.td, and swift/tools/SourceKit/tools/sourcekitd-test/Options.td – into a syntax that looks like C macro invocations.

For example, the -driver-print-jobs compiler option, mentioned in several past articles, is defined in Options.td like so:

swift/include/swift/Option/Options.td

18  include "llvm/Option/OptParser.td"
..
82  def driver_print_jobs : Flag<["-"], "driver-print-jobs">, InternalDebugOpt,
83    HelpText<"Dump list of jobs to execute">;

I can use llvm-tblgen to transform the file in which it's defined:

/path/to/build/llvm-macosx-x86_64/bin/llvm-tblgen \
    -I ~/local/Source/apple/llvm/include \
    -I ~/local/Source/apple/swift/include/swift/Option \
    ~/local/Source/apple/swift/include/swift/Option/Options.td \
    -gen-opt-parser-defs

This takes all of the options in Options.td, and outputs them as calls to a C macro named OPTION. For example, here's the output that corresponds to the -driver-print-jobs definition shown above:

OPTION(
  prefix_1,
  "driver-print-jobs",
  driver_print_jobs,
  Flag,
  internal_debug_Group,
  INVALID,
  nullptr,
  HelpHidden | DoesNotAffectIncrementalBuild,
  0,
  "Dump list of jobs to execute",
  nullptr,
  nullptr)

The LLVM TableGen executable llvm-tblgen can output many different formats. I instructed it to output calls to the OPTION macro, by using the -gen-opt-parser-defs argument. Other arguments include -gen-ctags, which generates definitions and source locations for the popular source indexing utility ctags, and -print-records, which prints TableGen's internal representation of each entry.

With that prerequisite explanation of LLVM TableGen out of the way, we're ready to look at the four stages of how arguments are parsed in the Swift compiler.

Stage 1: Swift's CMake instructs TableGen to transform Options.td into Options.inc

If you haven't already, try reading The Swift Compiler's Build System and Reading and Understanding the CMake in apple/swift before continuing – you'll need to know the basics of CMake in order to enjoy this section.

A git grep for "driver-print-jobs" reveals that this option is defined in swift/include/swift/Option/Options.td. In fact, nearly every option supported by the Swift compiler is defined in this file. And in that same directory, a CMakeLists.txt file defines how these options are transformed by TableGen:

swift/include/swift/Option/CMakeLists.txt

1  set(LLVM_TARGET_DEFINITIONS Options.td)
2  swift_tablegen(Options.inc -gen-opt-parser-defs)
3  swift_add_public_tablegen_target(SwiftOptions)

Note that, unlike CMake functions that take their input files as arguments, the swift_tablegen CMake macro requires its input files be specified using a global variable named LLVM_TARGET_DEFINITIONS. The function also allows arbitrary options to be passed to the llvm-tblgen executable. In this case, that's -gen-opt-parser-defs – the option that transforms TableGen definitions into OPTION macro calls.

swift_tablgen and swift_add_public_tablegen_target are CMake macros defined in swift/cmake/modules/AddSwiftTableGen.cmake. These two macros call through to LLVM CMake functions tablegen and add_public_tablegen_target:

  1. The LLVM CMake function tablegen instructs CMake on how to produce the specified output file (Options.inc, in this case).
  2. The LLVM CMake function add_public_tablegen_target creates a public CMake target that depends on the production of the Options.inc file.

Here's the tablegen LLVM CMake function. It uses the built-in CMake function add_custom_command to define how llvm-tblgen should be run:

llvm/cmake/modules/TableGen.cmake

 11  function(tablegen project ofn)
 ..
 43    if (IS_ABSOLUTE ${LLVM_TARGET_DEFINITIONS})
 44      set(LLVM_TARGET_DEFINITIONS_ABSOLUTE ${LLVM_TARGET_DEFINITIONS})
 45    else()
 46      set(LLVM_TARGET_DEFINITIONS_ABSOLUTE
 47        ${CMAKE_CURRENT_SOURCE_DIR}/${LLVM_TARGET_DEFINITIONS})
 48    endif()
 ..
 65    add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${ofn}.tmp
 ..
 67      COMMAND ${${project}_TABLEGEN_EXE} ${ARGN} -I ${CMAKE_CURRENT_SOURCE_DIR}
 68      ${LLVM_TABLEGEN_FLAGS}
 69      ${LLVM_TARGET_DEFINITIONS_ABSOLUTE}
 ..
 77      COMMENT "Building ${ofn}..."
 78      )
 79    add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${ofn}
 80      # Only update the real output file if there are any differences.
 81      # This prevents recompilation of all the files depending on it if there
 82      # aren't any.
 83      COMMAND ${CMAKE_COMMAND} -E copy_if_different
 84          ${CMAKE_CURRENT_BINARY_DIR}/${ofn}.tmp
 85          ${CMAKE_CURRENT_BINARY_DIR}/${ofn}
 86      DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${ofn}.tmp
 87      COMMENT "Updating ${ofn}..."
 88      )
 ..
 93  
 94    set(TABLEGEN_OUTPUT ${TABLEGEN_OUTPUT} ${CMAKE_CURRENT_BINARY_DIR}/${ofn} PARENT_SCOPE)
 ..
 97  endfunction()

And here's the add_public_tablegen_target LLVM CMake function:

llvm/cmake/modules/TableGen.cmake

100  function(add_public_tablegen_target target)
...
104    add_custom_target(${target}
105      DEPENDS ${TABLEGEN_OUTPUT})
...
111  endfunction()

The add_public_tablegen_target calls the built-in CMake function add_custom_target in order to define a CMake target. In this case, the name of that target is SwiftOptions. I can manually generate the TableGen for SwiftOptions by invoking cmake --build on the command line, specifying SwiftOptions as the target I'd like to build:

cmake --build \
    /path/to/build/swift-macosx-x86_64 \
    --target SwiftOptions

The above command does essentially the same thing as the manual llvm-tblgen -gen-opt-parser-defs invocation shown earlier in this article.

Instead of having users manually build the SwiftOptions target, the CMake targets for several Swift compiler libraries, such as libswiftOption and libswiftDriver, specify a dependency upon SwiftOptions. For example, here's libswiftOption:

swift/lib/Option/CMakeLists.txt

1  add_swift_library(swiftOption STATIC
.
4    DEPENDS SwiftOptions

As a result of this declared dependency, building libswiftOption results in SwiftOptions being built first, which means llvm-tblgen is run on swift/include/swift/Option/Options.td in order to produce the file /path/to/build/swift-macosx-x86_64/include/swift/Option/Options.inc. And again, because the swift/include/swift/Option/CMakeLists.txt file specifies that llvm-tblgen be invoked with the -gen-opt-parser-defs argument, the Options.inc file is populated with one call to an OPTION macro for each option defined in the original Options.td file.

Stage 2: The libswiftOption headers declare an enum containing a case for each of the options

In stage one, llvm-tblgen transformed the Options.td file into a file named Options.inc. For each option defined in the original Options.td, the new Options.inc file contains an invocation of a macro named OPTION:

OPTION(
  prefix_1,
  "driver-print-jobs",
  driver_print_jobs,
  Flag,
  internal_debug_Group,
  INVALID,
  nullptr,
  HelpHidden | DoesNotAffectIncrementalBuild,
  0,
  "Dump list of jobs to execute",
  nullptr,
  nullptr)

This is useful because I can define the OPTION macro to do whatever I want, then #include "Options.inc" in order to have the macro invoked once for each option.

This is exactly what the libswiftOption header Options.h does in order to define the swift::options::ID enum. That enum contains a case for each option defined in Options.td. Without TableGen, that would mean manually listing each of these options out in the enum, like this:

namespace swift {
namespace options {

enum ID {
  OPT_driver_print_jobs,
  OPT_driver_print_actions,
  OPT_driver_skip_execution,
  OPT_driver_use_frontend_path,
  // ...182 more options.
};

Every time a Swift compiler developer wanted to add or remove a Swift option, they'd have to manually add or remove it from that enum. That's not only tedious, it's also error-prone.

Instead, the swift/include/Option/Options.h header defines the OPTION macro such that it concatenates the tokens OPT_<option-identifier> for each definition in Options.inc:

swift/include/swift/Option/Options.h

24  namespace swift {
25  namespace options {
..  
39    enum ID {
40      OPT_INVALID = 0,
41  #define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM,  \
42                 HELPTEXT, METAVAR, VALUES)                                      \
43    OPT_##ID,
44  #include "swift/Option/Options.inc"
45      LastOption
46  #undef OPTION
47    };

This results in an enum with several hundred cases, one for each option defined in Options.inc. Note that the macro definition of OPTION above only uses the third argument passed into it, which it calls ID. For -driver-print-jobs, that third parameter is driver_print_jobs:

OPTION(
  prefix_1,
  "driver-print-jobs",
  
driver_print_jobs
, Flag, internal_debug_Group, INVALID, nullptr, HelpHidden | DoesNotAffectIncrementalBuild, 0, "Dump list of jobs to execute", nullptr, nullptr)

Stage 3: libswiftOption instantiates an llvm::opt::OptTable, with an llvm::opt::OptTable::Info element for each of the options

LLVM provides a library, named libLLVMOption, that encapsulates common operations related to command-line argument parsing. For instance, libLLVMOption defines the llvm::opt::OptTable and llvm::opt::InputArgList classes:

libswiftOption is responsible for instantiating an llvm::opt::OptTable that's used to parse arguments passed to the swift executable. The llvm::opt::OptTable initializer takes an array of options it's supposed to parse. These options are represented using the llvm::opt::OptTable::Info struct:

llvm/include/llvm/Option/OptTable.h

39  class OptTable {
..
42    struct Info {
..
45      const char *const *Prefixes;
46      const char *Name;
47      const char *HelpText;
48      const char *MetaVar;
49      unsigned ID;
50      unsigned char Kind;
51      unsigned char Param;
52      unsigned short Flags;
53      unsigned short GroupID;
54      unsigned short AliasID;
55      const char *AliasArgs;
56      const char *Values;
57    };
..
83  protected:
84    OptTable(ArrayRef<Info> OptionInfos, bool IgnoreCase = false);

In order to create this array of options, libswiftOption once again defines the OPTION macro and includes the Options.inc file. This time, it defines the OPTION macro such that it list-initializes an OptTable::Info struct for each call, and it uses those to statically define an array named InfoTable[]:

swift/lib/Option/Options.cpp

13  #include "swift/Option/Options.h"
..
16  #include "llvm/Option/OptTable.h"
17  #include "llvm/Option/Option.h"
18  
19  using namespace swift::options;
20  using namespace llvm::opt;
..
26  static const OptTable::Info InfoTable[] = {
27  #define OPTION(PREFIX, NAME, ID, KIND, GROUP, ALIAS, ALIASARGS, FLAGS, PARAM,  \
28                 HELPTEXT, METAVAR, VALUES)                                      \
29    {PREFIX, NAME,  HELPTEXT,    METAVAR,     OPT_##ID,  Option::KIND##Class,    \
30     PARAM,  FLAGS, OPT_##GROUP, OPT_##ALIAS, ALIASARGS, VALUES},
31  #include "swift/Option/Options.inc"
32  #undef OPTION
33  };

After pre-processing, the OPTION macro calls are expanded such that the InfoTable[] initializer above looks like this:

static const OptTable::Info InfoTable[] = {
  {{"-", nullptr}, "driver-print-jobs", "Dump list of jobs to execute", nullptr, OPT_driver_print_jobs, Option::FlagClass, 0, HelpHidden | DoesNotAffectIncrementalBuild, OPT_internal_debug_Group, OPT_INVALID, nullptr, nullptr},
  {{"-", nullptr}, "driver-print-actions", "Dump list of actions to perform", nullptr, OPT_driver_print_actions, Option::FlagClass, 0, HelpHidden | DoesNotAffectIncrementalBuild, OPT_internal_debug_Group, OPT_INVALID, nullptr, nullptr},
  // ...184 more options.
};

Note that the enum values from stage two are used in these static option definitions: OPT_driver_print_jobs, OPT_driver_print_actions, and so on.

To allow other parts of the Swift compiler to have access to this table of options, libswiftOption creates an llvm::opt::OptTable subclass named SwiftOptTable, and initializes it with the InfoTable[] array from above:

swift/lib/Option/Options.cpp

37  class SwiftOptTable : public OptTable {
38  public:
39    SwiftOptTable() : OptTable(InfoTable) {}
40  };

It also defines a function that allows other libraries to grab a reference to the option table:

swift/lib/Option/Options.cpp

44  std::unique_ptr<OptTable> swift::createSwiftOptTable() {
45    return std::unique_ptr<OptTable>(new SwiftOptTable());
46  }

Stage 4: libswiftDriver calls the llvm::opt::OptTable::ParseArgs method

As I briefly mentioned in the previous article in this series, the swift::Driver initializer calls the createSwiftOptTable function in order to get a reference to the llvm::opt::OptTable subclass SwiftOptTable, storing it in an ivar named Driver::Opts:

swift/lib/Driver/Driver.cpp

64  Driver::Driver(StringRef DriverExecutable,
65                 StringRef Name,
66                 ArrayRef<const char *> Args,
67                 DiagnosticEngine &Diags)
68    : Opts(createSwiftOptTable()), Diags(Diags),

Then, the Driver has the command-line arguments to the swift executable parsed, in order to create and configure an instance of swift::Compilation. To parse the arguments, it calls the llvm::opt::OptTable::ParseArgs method:

swift/lib/Driver/Driver.cpp

 841  std::unique_ptr<InputArgList>
 842  Driver::parseArgStrings(ArrayRef<const char *> Args) {
 ...
 854    ArgList = llvm::make_unique<InputArgList>(
 855        getOpts().ParseArgs(Args, MissingArgIndex, MissingArgCount,
 856                            IncludedFlagsBitmask, ExcludedFlagsBitmask));
 ...
 888    return ArgList;
 889  }

The InputArgList returned by the Driver::parseArgStrings method is used throughout libswiftDriver. For example, the previous article in this series showcased the following code, which produced a warning if the -incremental and -whole-module-optimization arguments were used in the same invocation:

swift/lib/Driver/Driver.cpp

529  bool Incremental = ArgList->hasArg(options::OPT_incremental);
530  if (ArgList->hasArg(options::OPT_whole_module_optimization)) {
531    if (Incremental && ShowIncrementalBuildDecisions) {
532      llvm::outs() << "Incremental compilation has been disabled, because it "
533                   << "is not compatible with whole module optimization.";
534    }
535    Incremental = false;
536  }

The code above uses the llvm::opt::InputArgList::hasArg method to check for the OPT_incremental and OPT_whole_module_optimization arguments. Remember, these were cases that were added to the swift::options::ID enum by including Options.inc in stage two above! It all comes toghether here.

Besides checking for the existence of certain options, InputArgList also has methods to grab the values specified by those methods. Here's how libswiftFrontend grabs the value from the popular -warn-long-function-bodies= command-line option, by using the llvm::opt::InputArgList::getLastArg and llvm::opt::Arg::getValue methods:

swift/lib/Frontend/CompilerInvocation.cpp

 178  if (const Arg *A = Args.getLastArg(OPT_warn_long_function_bodies)) {
 179    unsigned attempt;
 180    if (StringRef(A->getValue()).getAsInteger(10, attempt)) {
 181      Diags.diagnose(SourceLoc(), diag::error_invalid_arg_value,
 182                     A->getAsString(Args), A->getValue());
 183    } else {
 184      Opts.WarnLongFunctionBodies = attempt;
 185    }
 186  }

Sorry, no magic here

I enjoy looking into the details of the Swift compiler because – and maybe this sounds silly – it helps me better understand that it's "just a program".

Because I was unfamilar with the LLVM TableGen utility, and with the C/C++ macros that the Swift compiler uses to define its options, it seemed like magic to me that modifying the Options.td file would result in changes to Swift's command-line options. But it's not magic – as this article described, it's a four-stage process in which:

  1. The Options.td file is transformed by TableGen.
  2. The transformed file, Options.inc, is included such that it defines a large enum with all the Swift options as values.
  3. The transformed file is included again, this time to intiialize an LLVM OptTable. This class is capable of searching command-line arguments for option values.
  4. The rest of the Swift compiler codebase uses the LLVM OptTable class to check for arguments as necessary.