gcc/doc/lto.texi

   1 @c Copyright (c) 2010 Free Software Foundation, Inc.
   2 @c Free Software Foundation, Inc.
   3 @c This is part of the GCC manual.
   4 @c For copying conditions, see the file gcc.texi.
   5 @c Contributed by Jan Hubicka <jh@suse.cz> and
   6 @c Diego Novillo <dnovillo@google.com>
   7
   8 @node LTO
   9 @chapter Link Time Optimization
  10 @cindex lto
  11 @cindex whopr
  12 @cindex wpa
  13 @cindex ltrans
  14
  15 @section Design Overview
  16
  17 Link time optimization is implemented as a GCC front end for a
  18 bytecode representation of GIMPLE that is emitted in special sections
  19 of @code{.o} files.  Currently, LTO support is enabled in most
  20 ELF-based systems, as well as darwin, cygwin and mingw systems.
  21
  22 Since GIMPLE bytecode is saved alongside final object code, object
  23 files generated with LTO support are larger than regular object files.
  24 This ``fat'' object format makes it easy to integrate LTO into
  25 existing build systems, as one can, for instance, produce archives of
  26 the files.  Additionally, one might be able to ship one set of fat
  27 objects which could be used both for development and the production of
  28 optimized builds.  A, perhaps surprising, side effect of this feature
  29 is that any mistake in the toolchain that leads to LTO information not
  30 being used (e.g.@: an older @code{libtool} calling @code{ld} directly).
  31 This is both an advantage, as the system is more robust, and a
  32 disadvantage, as the user is not informed that the optimization has
  33 been disabled.
  34
  35 The current implementation only produces ``fat'' objects, effectively
  36 doubling compilation time and increasing file sizes up to 5x the
  37 original size.  This hides the problem that some tools, such as
  38 @code{ar} and @code{nm}, need to understand symbol tables of LTO
  39 sections.  These tools were extended to use the plugin infrastructure,
  40 and with these problems solved, GCC will also support ``slim'' objects
  41 consisting of the intermediate code alone.
  42
  43 At the highest level, LTO splits the compiler in two.  The first half
  44 (the ``writer'') produces a streaming representation of all the
  45 internal data structures needed to optimize and generate code.  This
  46 includes declarations, types, the callgraph and the GIMPLE representation
  47 of function bodies.
  48
  49 When @option{-flto} is given during compilation of a source file, the
  50 pass manager executes all the passes in @code{all_lto_gen_passes}.
  51 Currently, this phase is composed of two IPA passes:
  52
  53 @itemize @bullet
  54 @item @code{pass_ipa_lto_gimple_out}
  55 This pass executes the function @code{lto_output} in
  56 @file{lto-streamer-out.c}, which traverses the call graph encoding
  57 every reachable declaration, type and function.  This generates a
  58 memory representation of all the file sections described below.
  59
  60 @item @code{pass_ipa_lto_finish_out}
  61 This pass executes the function @code{produce_asm_for_decls} in
  62 @file{lto-streamer-out.c}, which takes the memory image built in the
  63 previous pass and encodes it in the corresponding ELF file sections.
  64 @end itemize
  65
  66 The second half of LTO support is the ``reader''.  This is implemented
  67 as the GCC front end @file{lto1} in @file{lto/lto.c}.  When
  68 @file{collect2} detects a link set of @code{.o}/@code{.a} files with
  69 LTO information and the @option{-flto} is enabled, it invokes
  70 @file{lto1} which reads the set of files and aggregates them into a
  71 single translation unit for optimization.  The main entry point for
  72 the reader is @file{lto/lto.c}:@code{lto_main}.
  73
  74 @subsection LTO modes of operation
  75
  76 One of the main goals of the GCC link-time infrastructure was to allow
  77 effective compilation of large programs.  For this reason GCC implements two
  78 link-time compilation modes.
  79
  80 @enumerate
  81 @item   @emph{LTO mode}, in which the whole program is read into the
  82 compiler at link-time and optimized in a similar way as if it
  83 were a single source-level compilation unit.
  84
  85 @item   @emph{WHOPR or partitioned mode}, designed to utilize multiple
  86 CPUs and/or a distributed compilation environment to quickly link
  87 large applications.  WHOPR stands for WHOle Program optimizeR (not to
  88 be confused with the semantics of @option{-fwhole-program}).  It
  89 partitions the aggregated callgraph from many different @code{.o}
  90 files and distributes the compilation of the sub-graphs to different
  91 CPUs.
  92
  93 Note that distributed compilation is not implemented yet, but since
  94 the parallelism is facilitated via generating a @code{Makefile}, it
  95 would be easy to implement.
  96 @end enumerate
  97
  98 WHOPR splits LTO into three main stages:
  99 @enumerate
 100 @item Local generation (LGEN)
 101 This stage executes in parallel.  Every file in the program is compiled
 102 into the intermediate language and packaged together with the local
 103 call-graph and summary information.  This stage is the same for both
 104 the LTO and WHOPR compilation mode.
 105
 106 @item Whole Program Analysis (WPA)
 107 WPA is performed sequentially.  The global call-graph is generated, and
 108 a global analysis procedure makes transformation decisions.  The global
 109 call-graph is partitioned to facilitate parallel optimization during
 110 phase 3.  The results of the WPA stage are stored into new object files
 111 which contain the partitions of program expressed in the intermediate
 112 language and the optimization decisions.
 113
 114 @item Local transformations (LTRANS)
 115 This stage executes in parallel.  All the decisions made during phase 2
 116 are implemented locally in each partitioned object file, and the final
 117 object code is generated.  Optimizations which cannot be decided
 118 efficiently during the phase 2 may be performed on the local
 119 call-graph partitions.
 120 @end enumerate
 121
 122 WHOPR can be seen as an extension of the usual LTO mode of
 123 compilation.  In LTO, WPA and LTRANS are executed within a single
 124 execution of the compiler, after the whole program has been read into
 125 memory.
 126
 127 When compiling in WHOPR mode, the callgraph is partitioned during
 128 the WPA stage.  The whole program is split into a given number of
 129 partitions of roughly the same size.  The compiler tries to
 130 minimize the number of references which cross partition boundaries.
 131 The main advantage of WHOPR is to allow the parallel execution of
 132 LTRANS stages, which are the most time-consuming part of the
 133 compilation process.  Additionally, it avoids the need to load the
 134 whole program into memory.
 135
 136
 137 @section LTO file sections
 138
 139 LTO information is stored in several ELF sections inside object files.
 140 Data structures and enum codes for sections are defined in
 141 @file{lto-streamer.h}.
 142
 143 These sections are emitted from @file{lto-streamer-out.c} and mapped
 144 in all at once from @file{lto/lto.c}:@code{lto_file_read}.  The
 145 individual functions dealing with the reading/writing of each section
 146 are described below.
 147
 148 @itemize @bullet
 149 @item Command line options (@code{.gnu.lto_.opts})
 150
 151 This section contains the command line options used to generate the
 152 object files.  This is used at link time to determine the optimization
 153 level and other settings when they are not explicitly specified at the
 154 linker command line.
 155
 156 Currently, GCC does not support combining LTO object files compiled
 157 with different set of the command line options into a single binary.
 158 At link time, the options given on the command line and the options
 159 saved on all the files in a link-time set are applied globally.  No
 160 attempt is made at validating the combination of flags (other than the
 161 usual validation done by option processing).  This is implemented in
 162 @file{lto/lto.c}:@code{lto_read_all_file_options}.
 163
 164
 165 @item Symbol table (@code{.gnu.lto_.symtab})
 166
 167 This table replaces the ELF symbol table for functions and variables
 168 represented in the LTO IL.  Symbols used and exported by the optimized
 169 assembly code of ``fat'' objects might not match the ones used and
 170 exported by the intermediate code.  This table is necessary because
 171 the intermediate code is less optimized and thus requires a separate
 172 symbol table.
 173
 174 Additionally, the binary code in the ``fat'' object will lack a call
 175 to a function, since the call was optimized out at compilation time
 176 after the intermediate language was streamed out.  In some special
 177 cases, the same optimization may not happen during link-time
 178 optimization.  This would lead to an undefined symbol if only one
 179 symbol table was used.
 180
 181 The symbol table is emitted in
 182 @file{lto-streamer-out.c}:@code{produce_symtab}.
 183
 184
 185 @item Global declarations and types (@code{.gnu.lto_.decls})
 186
 187 This section contains an intermediate language dump of all
 188 declarations and types required to represent the callgraph, static
 189 variables and top-level debug info.
 190
 191 The contents of this section are emitted in
 192 @file{lto-streamer-out.c}:@code{produce_asm_for_decls}.  Types and
 193 symbols are emitted in a topological order that preserves the sharing
 194 of pointers when the file is read back in
 195 (@file{lto.c}:@code{read_cgraph_and_symbols}).
 196
 197
 198 @item The callgraph (@code{.gnu.lto_.cgraph})
 199
 200 This section contains the basic data structure used by the GCC
 201 inter-procedural optimization infrastructure.  This section stores an
 202 annotated multi-graph which represents the functions and call sites as
 203 well as the variables, aliases and top-level @code{asm} statements.
 204
 205 This section is emitted in
 206 @file{lto-streamer-out.c}:@code{output_cgraph} and read in
 207 @file{lto-cgraph.c}:@code{input_cgraph}.
 208
 209
 210 @item IPA references (@code{.gnu.lto_.refs})
 211
 212 This section contains references between function and static
 213 variables.  It is emitted by @file{lto-cgraph.c}:@code{output_refs}
 214 and read by @file{lto-cgraph.c}:@code{input_refs}.
 215
 216
 217 @item Function bodies (@code{.gnu.lto_.function_body.<name>})
 218
 219 This section contains function bodies in the intermediate language
 220 representation.  Every function body is in a separate section to allow
 221 copying of the section independently to different object files or
 222 reading the function on demand.
 223
 224 Functions are emitted in
 225 @file{lto-streamer-out.c}:@code{output_function} and read in
 226 @file{lto-streamer-in.c}:@code{input_function}.
 227
 228
 229 @item Static variable initializers (@code{.gnu.lto_.vars})
 230
 231 This section contains all the symbols in the global variable pool.  It
 232 is emitted by @file{lto-cgraph.c}:@code{output_varpool} and read in
 233 @file{lto-cgraph.c}:@code{input_cgraph}.
 234
 235 @item Summaries and optimization summaries used by IPA passes
 236 (@code{.gnu.lto_.<xxx>}, where @code{<xxx>} is one of @code{jmpfuncs},
 237 @code{pureconst} or @code{reference})
 238
 239 These sections are used by IPA passes that need to emit summary
 240 information during LTO generation to be read and aggregated at
 241 link time.  Each pass is responsible for implementing two pass manager
 242 hooks: one for writing the summary and another for reading it in.  The
 243 format of these sections is entirely up to each individual pass.  The
 244 only requirement is that the writer and reader hooks agree on the
 245 format.
 246 @end itemize
 247
 248
 249 @section Using summary information in IPA passes
 250
 251 Programs are represented internally as a @emph{callgraph} (a
 252 multi-graph where nodes are functions and edges are call sites)
 253 and a @emph{varpool} (a list of static and external variables in
 254 the program).
 255
 256 The inter-procedural optimization is organized as a sequence of
 257 individual passes, which operate on the callgraph and the
 258 varpool.  To make the implementation of WHOPR possible, every
 259 inter-procedural optimization pass is split into several stages
 260 that are executed at different times during WHOPR compilation:
 261
 262 @itemize @bullet
 263 @item LGEN time
 264 @enumerate
 265 @item @emph{Generate summary} (@code{generate_summary} in
 266 @code{struct ipa_opt_pass_d}).  This stage analyzes every function
 267 body and variable initializer is examined and stores relevant
 268 information into a pass-specific data structure.
 269
 270 @item @emph{Write summary} (@code{write_summary} in
 271 @code{struct ipa_opt_pass_d}).  This stage writes all the
 272 pass-specific information generated by @code{generate_summary}.
 273 Summaries go into their own @code{LTO_section_*} sections that
 274 have to be declared in @file{lto-streamer.h}:@code{enum
 275 lto_section_type}.  A new section is created by calling
 276 @code{create_output_block} and data can be written using the
 277 @code{lto_output_*} routines.
 278 @end enumerate
 279
 280 @item WPA time
 281 @enumerate
 282 @item @emph{Read summary} (@code{read_summary} in
 283 @code{struct ipa_opt_pass_d}).  This stage reads all the
 284 pass-specific information in exactly the same order that it was
 285 written by @code{write_summary}.
 286
 287 @item @emph{Execute} (@code{execute} in @code{struct
 288 opt_pass}).  This performs inter-procedural propagation.  This
 289 must be done without actual access to the individual function
 290 bodies or variable initializers.  Typically, this results in a
 291 transitive closure operation over the summary information of all
 292 the nodes in the callgraph.
 293
 294 @item @emph{Write optimization summary}
 295 (@code{write_optimization_summary} in @code{struct
 296 ipa_opt_pass_d}).  This writes the result of the inter-procedural
 297 propagation into the object file.  This can use the same data
 298 structures and helper routines used in @code{write_summary}.
 299 @end enumerate
 300
 301 @item LTRANS time
 302 @enumerate
 303 @item @emph{Read optimization summary}
 304 (@code{read_optimization_summary} in @code{struct
 305 ipa_opt_pass_d}).  The counterpart to
 306 @code{write_optimization_summary}.  This reads the interprocedural
 307 optimization decisions in exactly the same format emitted by
 308 @code{write_optimization_summary}.
 309
 310 @item @emph{Transform} (@code{function_transform} and
 311 @code{variable_transform} in @code{struct ipa_opt_pass_d}).
 312 The actual function bodies and variable initializers are updated
 313 based on the information passed down from the @emph{Execute} stage.
 314 @end enumerate
 315 @end itemize
 316
 317 The implementation of the inter-procedural passes are shared
 318 between LTO, WHOPR and classic non-LTO compilation.
 319
 320 @itemize
 321 @item During the traditional file-by-file mode every pass executes its
 322 own @emph{Generate summary}, @emph{Execute}, and @emph{Transform}
 323 stages within the single execution context of the compiler.
 324
 325 @item In LTO compilation mode, every pass uses @emph{Generate
 326 summary} and @emph{Write summary} stages at compilation time,
 327 while the @emph{Read summary}, @emph{Execute}, and
 328 @emph{Transform} stages are executed at link time.
 329
 330 @item In WHOPR mode all stages are used.
 331 @end itemize
 332
 333 To simplify development, the GCC pass manager differentiates
 334 between normal inter-procedural passes and small inter-procedural
 335 passes.  A @emph{small inter-procedural pass}
 336 (@code{SIMPLE_IPA_PASS}) is a pass that does
 337 everything at once and thus it can not be executed during WPA in
 338 WHOPR mode.  It defines only the @emph{Execute} stage and during
 339 this stage it accesses and modifies the function bodies.  Such
 340 passes are useful for optimization at LGEN or LTRANS time and are
 341 used, for example, to implement early optimization before writing
 342 object files.  The simple inter-procedural passes can also be used
 343 for easier prototyping and development of a new inter-procedural
 344 pass.
 345
 346
 347 @subsection Virtual clones
 348
 349 One of the main challenges of introducing the WHOPR compilation
 350 mode was addressing the interactions between optimization passes.
 351 In LTO compilation mode, the passes are executed in a sequence,
 352 each of which consists of analysis (or @emph{Generate summary}),
 353 propagation (or @emph{Execute}) and @emph{Transform} stages.
 354 Once the work of one pass is finished, the next pass sees the
 355 updated program representation and can execute.  This makes the
 356 individual passes dependent on each other.
 357
 358 In WHOPR mode all passes first execute their @emph{Generate
 359 summary} stage.  Then summary writing marks the end of the LGEN
 360 stage.  At WPA time,
 361 the summaries are read back into memory and all passes run the
 362 @emph{Execute} stage.  Optimization summaries are streamed and
 363 sent to LTRANS, where all the passes execute the @emph{Transform}
 364 stage.
 365
 366 Most optimization passes split naturally into analysis,
 367 propagation and transformation stages.  But some do not.  The
 368 main problem arises when one pass performs changes and the
 369 following pass gets confused by seeing different callgraphs
 370 between the @emph{Transform} stage and the @emph{Generate summary}
 371 or @emph{Execute} stage.  This means that the passes are required
 372 to communicate their decisions with each other.
 373
 374 To facilitate this communication, the GCC callgraph
 375 infrastructure implements @emph{virtual clones}, a method of
 376 representing the changes performed by the optimization passes in
 377 the callgraph without needing to update function bodies.
 378
 379 A @emph{virtual clone} in the callgraph is a function that has no
 380 associated body, just a description of how to create its body based
 381 on a different function (which itself may be a virtual clone).
 382
 383 The description of function modifications includes adjustments to
 384 the function's signature (which allows, for example, removing or
 385 adding function arguments), substitutions to perform on the
 386 function body, and, for inlined functions, a pointer to the
 387 function that it will be inlined into.
 388
 389 It is also possible to redirect any edge of the callgraph from a
 390 function to its virtual clone.  This implies updating of the call
 391 site to adjust for the new function signature.
 392
 393 Most of the transformations performed by inter-procedural
 394 optimizations can be represented via virtual clones.  For
 395 instance, a constant propagation pass can produce a virtual clone
 396 of the function which replaces one of its arguments by a
 397 constant.  The inliner can represent its decisions by producing a
 398 clone of a function whose body will be later integrated into
 399 a given function.
 400
 401 Using @emph{virtual clones}, the program can be easily updated
 402 during the @emph{Execute} stage, solving most of pass interactions
 403 problems that would otherwise occur during @emph{Transform}.
 404
 405 Virtual clones are later materialized in the LTRANS stage and
 406 turned into real functions.  Passes executed after the virtual
 407 clone were introduced also perform their @emph{Transform} stage
 408 on new functions, so for a pass there is no significant
 409 difference between operating on a real function or a virtual
 410 clone introduced before its @emph{Execute} stage.
 411
 412 Optimization passes then work on virtual clones introduced before
 413 their @emph{Execute} stage as if they were real functions.  The
 414 only difference is that clones are not visible during the
 415 @emph{Generate Summary} stage.
 416
 417 To keep function summaries updated, the callgraph interface
 418 allows an optimizer to register a callback that is called every
 419 time a new clone is introduced as well as when the actual
 420 function or variable is generated or when a function or variable
 421 is removed.  These hooks are registered in the @emph{Generate
 422 summary} stage and allow the pass to keep its information intact
 423 until the @emph{Execute} stage.  The same hooks can also be
 424 registered during the @emph{Execute} stage to keep the
 425 optimization summaries updated for the @emph{Transform} stage.
 426
 427 @subsection IPA references
 428
 429 GCC represents IPA references in the callgraph.  For a function
 430 or variable @code{A}, the @emph{IPA reference} is a list of all
 431 locations where the address of @code{A} is taken and, when
 432 @code{A} is a variable, a list of all direct stores and reads
 433 to/from @code{A}.  References represent an oriented multi-graph on
 434 the union of nodes of the callgraph and the varpool.  See
 435 @file{ipa-reference.c}:@code{ipa_reference_write_optimization_summary}
 436 and
 437 @file{ipa-reference.c}:@code{ipa_reference_read_optimization_summary}
 438 for details.
 439
 440 @subsection Jump functions
 441 Suppose that an optimization pass sees a function @code{A} and it
 442 knows the values of (some of) its arguments.  The @emph{jump
 443 function} describes the value of a parameter of a given function
 444 call in function @code{A} based on this knowledge.
 445
 446 Jump functions are used by several optimizations, such as the
 447 inter-procedural constant propagation pass and the
 448 devirtualization pass.  The inliner also uses jump functions to
 449 perform inlining of callbacks.
 450
 451 @section Whole program assumptions, linker plugin and symbol visibilities
 452
 453 Link-time optimization gives relatively minor benefits when used
 454 alone.  The problem is that propagation of inter-procedural
 455 information does not work well across functions and variables
 456 that are called or referenced by other compilation units (such as
 457 from a dynamically linked library).  We say that such functions
 458 are variables are @emph{externally visible}.
 459
 460 To make the situation even more difficult, many applications
 461 organize themselves as a set of shared libraries, and the default
 462 ELF visibility rules allow one to overwrite any externally
 463 visible symbol with a different symbol at runtime.  This
 464 basically disables any optimizations across such functions and
 465 variables, because the compiler cannot be sure that the function
 466 body it is seeing is the same function body that will be used at
 467 runtime.  Any function or variable not declared @code{static} in
 468 the sources degrades the quality of inter-procedural
 469 optimization.
 470
 471 To avoid this problem the compiler must assume that it sees the
 472 whole program when doing link-time optimization.  Strictly
 473 speaking, the whole program is rarely visible even at link-time.
 474 Standard system libraries are usually linked dynamically or not
 475 provided with the link-time information.  In GCC, the whole
 476 program option (@option{-fwhole-program}) asserts that every
 477 function and variable defined in the current compilation
 478 unit is static, except for function @code{main} (note: at
 479 link time, the current unit is the union of all objects compiled
 480 with LTO).  Since some functions and variables need to
 481 be referenced externally, for example by another DSO or from an
 482 assembler file, GCC also provides the function and variable
 483 attribute @code{externally_visible} which can be used to disable
 484 the effect of @option{-fwhole-program} on a specific symbol.
 485
 486 The whole program mode assumptions are slightly more complex in
 487 C++, where inline functions in headers are put into @emph{COMDAT}
 488 sections.  COMDAT function and variables can be defined by
 489 multiple object files and their bodies are unified at link-time
 490 and dynamic link-time.  COMDAT functions are changed to local only
 491 when their address is not taken and thus un-sharing them with a
 492 library is not harmful.  COMDAT variables always remain externally
 493 visible, however for readonly variables it is assumed that their
 494 initializers cannot be overwritten by a different value.
 495
 496 GCC provides the function and variable attribute
 497 @code{visibility} that can be used to specify the visibility of
 498 externally visible symbols (or alternatively an
 499 @option{-fdefault-visibility} command line option).  ELF defines
 500 the @code{default}, @code{protected}, @code{hidden} and
 501 @code{internal} visibilities.
 502
 503 The most commonly used is visibility is @code{hidden}.  It
 504 specifies that the symbol cannot be referenced from outside of
 505 the current shared library.  Unfortunately, this information
 506 cannot be used directly by the link-time optimization in the
 507 compiler since the whole shared library also might contain
 508 non-LTO objects and those are not visible to the compiler.
 509
 510 GCC solves this problem using linker plugins.  A @emph{linker
 511 plugin} is an interface to the linker that allows an external
 512 program to claim the ownership of a given object file.  The linker
 513 then performs the linking procedure by querying the plugin about
 514 the symbol table of the claimed objects and once the linking
 515 decisions are complete, the plugin is allowed to provide the
 516 final object file before the actual linking is made.  The linker
 517 plugin obtains the symbol resolution information which specifies
 518 which symbols provided by the claimed objects are bound from the
 519 rest of a binary being linked.
 520
 521 Currently, the linker plugin  works only in combination
 522 with the Gold linker, but a GNU ld implementation is under
 523 development.
 524
 525 GCC is designed to be independent of the rest of the toolchain
 526 and aims to support linkers without plugin support.  For this
 527 reason it does not use the linker plugin by default.  Instead,
 528 the object files are examined by @command{collect2} before being
 529 passed to the linker and objects found to have LTO sections are
 530 passed to @command{lto1} first.  This mode does not work for
 531 library archives.  The decision on what object files from the
 532 archive are needed depends on the actual linking and thus GCC
 533 would have to implement the linker itself.  The resolution
 534 information is missing too and thus GCC needs to make an educated
 535 guess based on @option{-fwhole-program}.  Without the linker
 536 plugin GCC also assumes that symbols are declared @code{hidden}
 537 and not referred by non-LTO code by default.
 538
 539 @section Internal flags controlling @code{lto1}
 540
 541 The following flags are passed into @command{lto1} and are not
 542 meant to be used directly from the command line.
 543
 544 @itemize
 545 @item -fwpa
 546 @opindex fwpa
 547 This option runs the serial part of the link-time optimizer
 548 performing the inter-procedural propagation (WPA mode).  The
 549 compiler reads in summary information from all inputs and
 550 performs an analysis based on summary information only.  It
 551 generates object files for subsequent runs of the link-time
 552 optimizer where individual object files are optimized using both
 553 summary information from the WPA mode and the actual function
 554 bodies.  It then drives the LTRANS phase.
 555
 556 @item -fltrans
 557 @opindex fltrans
 558 This option runs the link-time optimizer in the
 559 local-transformation (LTRANS) mode, which reads in output from a
 560 previous run of the LTO in WPA mode.  In the LTRANS mode, LTO
 561 optimizes an object and produces the final assembly.
 562
 563 @item -fltrans-output-list=@var{file}
 564 @opindex fltrans-output-list
 565 This option specifies a file to which the names of LTRANS output
 566 files are written.  This option is only meaningful in conjunction
 567 with @option{-fwpa}.
 568 @end itemize