gcc/f/ffe.texi

   1 @c Copyright (C) 1999, 2003 Free Software Foundation, Inc.
   2 @c This is part of the G77 manual.
   3 @c For copying conditions, see the file g77.texi.
   4
   5 @node Front End
   6 @chapter Front End
   7 @cindex GNU Fortran Front End (FFE)
   8 @cindex FFE
   9 @cindex @code{g77}, front end
  10 @cindex front end, @code{g77}
  11
  12 This chapter describes some aspects of the design and implementation
  13 of the @code{g77} front end.
  14
  15 To find about things that are ``To Be Determined'' or ``To Be Done'',
  16 search for the string TBD.
  17 If you want to help by working on one or more of these items,
  18 email @email{gcc@@gcc.gnu.org}.
  19 If you're planning to do more than just research issues and offer comments,
  20 see @uref{http://gcc.gnu.org/contribute.html} for steps you might
  21 need to take first.
  22
  23 @menu
  24 * Overview of Sources::
  25 * Overview of Translation Process::
  26 * Philosophy of Code Generation::
  27 * Two-pass Design::
  28 * Challenges Posed::
  29 * Transforming Statements::
  30 * Transforming Expressions::
  31 * Internal Naming Conventions::
  32 @end menu
  33
  34 @node Overview of Sources
  35 @section Overview of Sources
  36
  37 The current directory layout includes the following:
  38
  39 @table @file
  40 @item @var{srcdir}/gcc/
  41 Non-g77 files in gcc
  42
  43 @item @var{srcdir}/gcc/f/
  44 GNU Fortran front end sources
  45
  46 @item @var{srcdir}/libf2c/
  47 @code{libg2c} configuration and @code{g2c.h} file generation
  48
  49 @item @var{srcdir}/libf2c/libF77/
  50 General support and math portion of @code{libg2c}
  51
  52 @item @var{srcdir}/libf2c/libI77/
  53 I/O portion of @code{libg2c}
  54
  55 @item @var{srcdir}/libf2c/libU77/
  56 Additional interfaces to Unix @code{libc} for @code{libg2c}
  57 @end table
  58
  59 Components of note in @code{g77} are described below.
  60
  61 @file{f/} as a whole contains the source for @code{g77},
  62 while @file{libf2c/} contains a portion of the separate program
  63 @code{f2c}.
  64 Note that the @code{libf2c} code is not part of the program @code{g77},
  65 just distributed with it.
  66
  67 @file{f/} contains text files that document the Fortran compiler, source
  68 files for the GNU Fortran Front End (FFE), and some other stuff.
  69 The @code{g77} compiler code is placed in @file{f/} because it,
  70 along with its contents,
  71 is designed to be a subdirectory of a @code{gcc} source directory,
  72 @file{gcc/},
  73 which is structured so that language-specific front ends can be ``dropped
  74 in'' as subdirectories.
  75 The C++ front end (@code{g++}), is an example of this---it resides in
  76 the @file{cp/} subdirectory.
  77 Note that the C front end (also referred to as @code{gcc})
  78 is an exception to this, as its source files reside
  79 in the @file{gcc/} directory itself.
  80
  81 @file{libf2c/} contains the run-time libraries for the @code{f2c} program,
  82 also used by @code{g77}.
  83 These libraries normally referred to collectively as @code{libf2c}.
  84 When built as part of @code{g77},
  85 @code{libf2c} is installed under the name @code{libg2c} to avoid
  86 conflict with any existing version of @code{libf2c},
  87 and thus is often referred to as @code{libg2c} when the
  88 @code{g77} version is specifically being referred to.
  89
  90 The @code{netlib} version of @code{libf2c/}
  91 contains two distinct libraries,
  92 @code{libF77} and @code{libI77},
  93 each in their own subdirectories.
  94 In @code{g77}, this distinction is not made,
  95 beyond maintaining the subdirectory structure in the source-code tree.
  96
  97 @file{libf2c/} is not part of the program @code{g77},
  98 just distributed with it.
  99 It contains files not present
 100 in the official (@code{netlib}) version of @code{libf2c},
 101 and also contains some minor changes made from @code{libf2c},
 102 to fix some bugs,
 103 and to facilitate automatic configuration, building, and installation of
 104 @code{libf2c} (as @code{libg2c}) for use by @code{g77} users.
 105 See @file{libf2c/README} for more information,
 106 including licensing conditions
 107 governing distribution of programs containing code from @code{libg2c}.
 108
 109 @code{libg2c}, @code{g77}'s version of @code{libf2c},
 110 adds Dave Love's implementation of @code{libU77},
 111 in the @file{libf2c/libU77/} directory.
 112 This library is distributed under the
 113 GNU Library General Public License (LGPL)---see the
 114 file @file{libf2c/libU77/COPYING.LIB}
 115 for more information,
 116 as this license
 117 governs distribution conditions for programs containing code
 118 from this portion of the library.
 119
 120 Files of note in @file{f/} and @file{libf2c/} are described below:
 121
 122 @table @file
 123 @item f/BUGS
 124 Lists some important bugs known to be in g77.
 125 Or use Info (or GNU Emacs Info mode) to read
 126 the ``Actual Bugs'' node of the @code{g77} documentation:
 127
 128 @smallexample
 129 info -f f/g77.info -n "Actual Bugs"
 130 @end smallexample
 131
 132 @item f/ChangeLog
 133 Lists recent changes to @code{g77} internals.
 134
 135 @item libf2c/ChangeLog
 136 Lists recent changes to @code{libg2c} internals.
 137
 138 @item f/NEWS
 139 Contains the per-release changes.
 140 These include the user-visible
 141 changes described in the node ``Changes''
 142 in the @code{g77} documentation, plus internal
 143 changes of import.
 144 Or use:
 145
 146 @smallexample
 147 info -f f/g77.info -n News
 148 @end smallexample
 149
 150 @item f/g77.info*
 151 The @code{g77} documentation, in Info format,
 152 produced by building @code{g77}.
 153
 154 All users of @code{g77} (not just installers) should read this,
 155 using the @code{more} command if neither the @code{info} command,
 156 nor GNU Emacs (with its Info mode), are available, or if users
 157 aren't yet accustomed to using these tools.
 158 All of these files are readable as ``plain text'' files,
 159 though they're easier to navigate using Info readers
 160 such as @code{info} and GNU Emacs Info mode.
 161 @end table
 162
 163 If you want to explore the FFE code, which lives entirely in @file{f/},
 164 here are a few clues.
 165 The file @file{g77spec.c} contains the @code{g77}-specific source code
 166 for the @code{g77} command only---this just forms a variant of the
 167 @code{gcc} command, so,
 168 just as the @code{gcc} command itself does not contain the C front end,
 169 the @code{g77} command does not contain the Fortran front end (FFE).
 170 The FFE code ends up in an executable named @file{f771},
 171 which does the actual compiling,
 172 so it contains the FFE plus the @code{gcc} back end (GBE),
 173 the latter to do most of the optimization, and the code generation.
 174
 175 The file @file{parse.c} is the source file for @code{yyparse()},
 176 which is invoked by the GBE to start the compilation process,
 177 for @file{f771}.
 178
 179 The file @file{top.c} contains the top-level FFE function @code{ffe_file}
 180 and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*},
 181 and @samp{FFE_[A-Za-z].*} symbols.
 182
 183 The file @file{fini.c} is a @code{main()} program that is used when building
 184 the FFE to generate C header and source files for recognizing keywords.
 185 The files @file{malloc.c} and @file{malloc.h} comprise a memory manager
 186 that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and
 187 @samp{MALLOC_[A-Za-z].*} symbols.
 188
 189 All other modules named @var{xyz}
 190 are comprised of all files named @samp{@var{xyz}*.@var{ext}}
 191 and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*},
 192 and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols.
 193 If you understand all this, congratulations---it's easier for me to remember
 194 how it works than to type in these regular expressions.
 195 But it does make it easy to find where a symbol is defined.
 196 For example, the symbol @samp{ffexyz_set_something} would be defined
 197 in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}.
 198
 199 The ``porting'' files of note currently are:
 200
 201 @table @file
 202 @item proj.h
 203 This defines the ``language'' used by all the other source files,
 204 the language being Standard C plus some useful things
 205 like @code{ARRAY_SIZE} and such.
 206
 207 @item target.c
 208 @itemx target.h
 209 These describe the target machine
 210 in terms of what data types are supported,
 211 how they are denoted
 212 (to what C type does an @code{INTEGER*8} map, for example),
 213 how to convert between them,
 214 and so on.
 215 Over time, versions of @code{g77} rely less on this file
 216 and more on run-time configuration based on GBE info
 217 in @file{com.c}.
 218
 219 @item com.c
 220 @itemx com.h
 221 These are the primary interface to the GBE.
 222
 223 @item ste.c
 224 @itemx ste.h
 225 This contains code for implementing recognized executable statements
 226 in the GBE.
 227
 228 @item src.c
 229 @itemx src.h
 230 These contain information on the format(s) of source files
 231 (such as whether they are never to be processed as case-insensitive
 232 with regard to Fortran keywords).
 233 @end table
 234
 235 If you want to debug the @file{f771} executable,
 236 for example if it crashes,
 237 note that the global variables @code{lineno} and @code{input_filename}
 238 are usually set to reflect the current line being read by the lexer
 239 during the first-pass analysis of a program unit and to reflect
 240 the current line being processed during the second-pass compilation
 241 of a program unit.
 242
 243 If an invocation of the function @code{ffestd_exec_end} is on the stack,
 244 the compiler is in the second pass, otherwise it is in the first.
 245
 246 (This information might help you reduce a test case and/or work around
 247 a bug in @code{g77} until a fix is available.)
 248
 249 @node Overview of Translation Process
 250 @section Overview of Translation Process
 251
 252 The order of phases translating source code to the form accepted
 253 by the GBE is:
 254
 255 @enumerate
 256 @item
 257 Stripping punched-card sources (@file{g77stripcard.c})
 258
 259 @item
 260 Lexing (@file{lex.c})
 261
 262 @item
 263 Stand-alone statement identification (@file{sta.c})
 264
 265 @item
 266 INCLUDE handling (@file{sti.c})
 267
 268 @item
 269 Order-dependent statement identification (@file{stq.c})
 270
 271 @item
 272 Parsing (@file{stb.c} and @file{expr.c})
 273
 274 @item
 275 Constructing (@file{stc.c})
 276
 277 @item
 278 Collecting (@file{std.c})
 279
 280 @item
 281 Expanding (@file{ste.c})
 282 @end enumerate
 283
 284 To get a rough idea of how a particularly twisted Fortran statement
 285 gets treated by the passes, consider:
 286
 287 @smallexample
 288       FORMAT(I2 4H)=(J/
 289      &   I3)
 290 @end smallexample
 291
 292 The job of @file{lex.c} is to know enough about Fortran syntax rules
 293 to break the statement up into distinct lexemes without requiring
 294 any feedback from subsequent phases:
 295
 296 @smallexample
 297 `FORMAT'
 298 `('
 299 `I24H'
 300 `)'
 301 `='
 302 `('
 303 `J'
 304 `/'
 305 `I3'
 306 `)'
 307 @end smallexample
 308
 309 The job of @file{sta.c} is to figure out the kind of statement,
 310 or, at least, statement form, that sequence of lexemes represent.
 311
 312 The sooner it can do this (in terms of using the smallest number of
 313 lexemes, starting with the first for each statement), the better,
 314 because that leaves diagnostics for problems beyond the recognition
 315 of the statement form to subsequent phases,
 316 which can usually better describe the nature of the problem.
 317
 318 In this case, the @samp{=} at ``level zero''
 319 (not nested within parentheses)
 320 tells @file{sta.c} that this is an @emph{assignment-form},
 321 not @code{FORMAT}, statement.
 322
 323 An assignment-form statement might be a statement-function
 324 definition or an executable assignment statement.
 325
 326 To make that determination,
 327 @file{sta.c} looks at the first two lexemes.
 328
 329 Since the second lexeme is @samp{(},
 330 the first must represent an array for this to be an assignment statement,
 331 else it's a statement function.
 332
 333 Either way, @file{sta.c} hands off the statement to @file{stq.c}
 334 (via @file{sti.c}, which expands INCLUDE files).
 335 @file{stq.c} figures out what a statement that is,
 336 on its own, ambiguous, must actually be based on the context
 337 established by previous statements.
 338
 339 So, @file{stq.c} watches the statement stream for executable statements,
 340 END statements, and so on, so it knows whether @samp{A(B)=C} is
 341 (intended as) a statement-function definition or an assignment statement.
 342
 343 After establishing the context-aware statement info, @file{stq.c}
 344 passes the original sample statement on to @file{stb.c}
 345 (either its statement-function parser or its assignment-statement parser).
 346
 347 @file{stb.c} forms a
 348 statement-specific record containing the pertinent information.
 349 That information includes a source expression and,
 350 for an assignment statement, a destination expression.
 351 Expressions are parsed by @file{expr.c}.
 352
 353 This record is passed to @file{stc.c},
 354 which copes with the implications of the statement
 355 within the context established by previous statements.
 356
 357 For example, if it's the first statement in the file
 358 or after an @code{END} statement,
 359 @file{stc.c} recognizes that, first of all,
 360 a main program unit is now being lexed
 361 (and tells that to @file{std.c}
 362 before telling it about the current statement).
 363
 364 @file{stc.c} attaches whatever information it can,
 365 usually derived from the context established by the preceding statements,
 366 and passes the information to @file{std.c}.
 367
 368 @file{std.c} saves this information away,
 369 since the GBE cannot cope with information
 370 that might be incomplete at this stage.
 371
 372 For example, @samp{I3} might later be determined
 373 to be an argument to an alternate @code{ENTRY} point.
 374
 375 When @file{std.c} is told about the end of an external (top-level)
 376 program unit,
 377 it passes all the information it has saved away
 378 on statements in that program unit
 379 to @file{ste.c}.
 380
 381 @file{ste.c} ``expands'' each statement, in sequence, by
 382 constructing the appropriate GBE information and calling
 383 the appropriate GBE routines.
 384
 385 Details on the transformational phases follow.
 386 Keep in mind that Fortran numbering is used,
 387 so the first character on a line is column 1,
 388 decimal numbering is used, and so on.
 389
 390 @menu
 391 * g77stripcard::
 392 * lex.c::
 393 * sta.c::
 394 * sti.c::
 395 * stq.c::
 396 * stb.c::
 397 * expr.c::
 398 * stc.c::
 399 * std.c::
 400 * ste.c::
 401
 402 * Gotchas (Transforming)::
 403 * TBD (Transforming)::
 404 @end menu
 405
 406 @node g77stripcard
 407 @subsection g77stripcard
 408
 409 The @code{g77stripcard} program handles removing content beyond
 410 column 72 (adjustable via a command-line option),
 411 optionally warning about that content being something other
 412 than trailing whitespace or Fortran commentary.
 413
 414 This program is needed because @code{lex.c} doesn't pay attention
 415 to maximum line lengths at all, to make it easier to maintain,
 416 as well as faster (for sources that don't depend on the maximum
 417 column length vis-a-vis trailing non-blank non-commentary content).
 418
 419 Just how this program will be run---whether automatically for
 420 old source (perhaps as the default for @file{.f} files?)---is not
 421 yet determined.
 422
 423 In the meantime, it might as well be implemented as a typical UNIX pipe.
 424
 425 It should accept a @samp{-fline-length-@var{n}} option,
 426 with the default line length set to 72.
 427
 428 When the text it strips off the end of a line is not blank
 429 (not spaces and tabs),
 430 it should insert an additional comment line
 431 (beginning with @samp{!},
 432 so it works for both fixed-form and free-form files)
 433 containing the text,
 434 following the stripped line.
 435 The inserted comment should have a prefix of some kind,
 436 TBD, that distinguishes the comment as representing stripped text.
 437 Users could use that to @code{sed} out such lines, if they wished---it
 438 seems silly to provide a command-line option to delete information
 439 when it can be so easily filtered out by another program.
 440
 441 (This inserted comment should be designed to ``fit in'' well
 442 with whatever the Fortran community is using these days for
 443 preprocessor, translator, and other such products, like OpenMP.
 444 What that's all about, and how @code{g77} can elegantly fit its
 445 special comment conventions into it all, is TBD as well.
 446 We don't want to reinvent the wheel here, but if there turn out
 447 to be too many conflicting conventions, we might have to invent
 448 one that looks nothing like the others, but which offers their
 449 host products a better infrastructure in which to fit and coexist
 450 peacefully.)
 451
 452 @code{g77stripcard} probably shouldn't do any tab expansion or other
 453 fancy stuff.
 454 People can use @code{expand} or other pre-filtering if they like.
 455 The idea here is to keep each stage quite simple, while providing
 456 excellent performance for ``normal'' code.
 457
 458 (Code with junk beyond column 73 is not really ``normal'',
 459 as it comes from a card-punch heritage,
 460 and will be increasingly hard for tomorrow's Fortran programmers to read.)
 461
 462 @node lex.c
 463 @subsection lex.c
 464
 465 To help make the lexer simple, fast, and easy to maintain,
 466 while also having @code{g77} generally encourage Fortran programmers
 467 to write simple, maintainable, portable code by maximizing the
 468 performance of compiling that kind of code:
 469
 470 @itemize @bullet
 471 @item
 472 There'll be just one lexer, for both fixed-form and free-form source.
 473
 474 @item
 475 It'll care about the form only when handling the first 7 columns of
 476 text, stuff like spaces between strings of alphanumerics, and
 477 how lines are continued.
 478
 479 Some other distinctions will be handled by subsequent phases,
 480 so at least one of them will have to know which form is involved.
 481
 482 For example, @samp{I = 2 . 4} is acceptable in fixed form,
 483 and works in free form as well given the implementation @code{g77}
 484 presently uses.
 485 But the standard requires a diagnostic for it in free form,
 486 so the parser has to be able to recognize that
 487 the lexemes aren't contiguous
 488 (information the lexer @emph{does} have to provide)
 489 and that free-form source is being parsed,
 490 so it can provide the diagnostic.
 491
 492 The @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme.
 493 Otherwise, it'd have to know a whole lot more about how to parse Fortran,
 494 or subsequent phases (mainly parsing) would have two paths through
 495 lots of critical code---one to handle the lexeme @samp{2}, @samp{.},
 496 and @samp{4} in sequence, another to handle the lexeme @samp{2.4}.
 497
 498 @item
 499 It won't worry about line lengths
 500 (beyond the first 7 columns for fixed-form source).
 501
 502 That is, once it starts parsing the ``statement'' part of a line
 503 (column 7 for fixed-form, column 1 for free-form),
 504 it'll keep going until it finds a newline,
 505 rather than ignoring everything past a particular column
 506 (72 or 132).
 507
 508 The implication here is that there shouldn't @emph{be}
 509 anything past that last column, other than whitespace or
 510 commentary, because users using typical editors
 511 (or viewing output as typically printed)
 512 won't necessarily know just where the last column is.
 513
 514 Code that has ``garbage'' beyond the last column
 515 (almost certainly only fixed-form code with a punched-card legacy,
 516 such as code using columns 73-80 for ``sequence numbers'')
 517 will have to be run through @code{g77stripcard} first.
 518
 519 Also, keeping track of the maximum column position while also watching out
 520 for the end of a line @emph{and} while reading from a file
 521 just makes things slower.
 522 Since a file must be read, and watching for the end of the line
 523 is necessary (unless the typical input file was preprocessed to
 524 include the necessary number of trailing spaces),
 525 dropping the tracking of the maximum column position
 526 is the only way to reduce the complexity of the pertinent code
 527 while maintaining high performance.
 528
 529 @item
 530 ASCII encoding is assumed for the input file.
 531
 532 Code written in other character sets will have to be converted first.
 533
 534 @item
 535 Tabs (ASCII code 9)
 536 will be converted to spaces via the straightforward
 537 approach.
 538
 539 Specifically, a tab is converted to between one and eight spaces
 540 as necessary to reach column @var{n},
 541 where dividing @samp{(@var{n} - 1)} by eight
 542 results in a remainder of zero.
 543
 544 That saves having to pass most source files through @code{expand}.
 545
 546 @item
 547 Linefeeds (ASCII code 10)
 548 mark the ends of lines.
 549
 550 @item
 551 A carriage return (ASCII code 13)
 552 is accept if it immediately precedes a linefeed,
 553 in which case it is ignored.
 554
 555 Otherwise, it is rejected (with a diagnostic).
 556
 557 @item
 558 Any other characters other than the above
 559 that are not part of the GNU Fortran Character Set
 560 (@pxref{Character Set})
 561 are rejected with a diagnostic.
 562
 563 This includes backspaces, form feeds, and the like.
 564
 565 (It might make sense to allow a form feed in column 1
 566 as long as that's the only character on a line.
 567 It certainly wouldn't seem to cost much in terms of performance.)
 568
 569 @item
 570 The end of the input stream (EOF)
 571 ends the current line.
 572
 573 @item
 574 The distinction between uppercase and lowercase letters
 575 will be preserved.
 576
 577 It will be up to subsequent phases to decide to fold case.
 578
 579 Current plans are to permit any casing for Fortran (reserved) keywords
 580 while preserving casing for user-defined names.
 581 (This might not be made the default for @file{.f} files, though.)
 582
 583 Preserving case seems necessary to provide more direct access
 584 to facilities outside of @code{g77}, such as to C or Pascal code.
 585
 586 Names of intrinsics will probably be matchable in any case,
 587
 588 (How @samp{external SiN; r = sin(x)} would be handled is TBD.
 589 I think old @code{g77} might already handle that pretty elegantly,
 590 but whether we can cope with allowing the same fragment to reference
 591 a @emph{different} procedure, even with the same interface,
 592 via @samp{s = SiN(r)}, needs to be determined.
 593 If it can't, we need to make sure that when code introduces
 594 a user-defined name, any intrinsic matching that name
 595 using a case-insensitive comparison
 596 is ``turned off''.)
 597
 598 @item
 599 Backslashes in @code{CHARACTER} and Hollerith constants
 600 are not allowed.
 601
 602 This avoids the confusion introduced by some Fortran compiler vendors
 603 providing C-like interpretation of backslashes,
 604 while others provide straight-through interpretation.
 605
 606 Some kind of lexical construct (TBD) will be provided to allow
 607 flagging of a @code{CHARACTER}
 608 (but probably not a Hollerith)
 609 constant that permits backslashes.
 610 It'll necessarily be a prefix, such as:
 611
 612 @smallexample
 613 PRINT *, C'This line has a backspace \b here.'
 614 PRINT *, F'This line has a straight backslash \ here.'
 615 @end smallexample
 616
 617 Further, command-line options might be provided to specify that
 618 one prefix or the other is to be assumed as the default
 619 for @code{CHARACTER} constants.
 620
 621 However, it seems more helpful for @code{g77} to provide a program
 622 that converts prefix all constants
 623 (or just those containing backslashes)
 624 with the desired designation,
 625 so printouts of code can be read
 626 without knowing the compile-time options used when compiling it.
 627
 628 If such a program is provided
 629 (let's name it @code{g77slash} for now),
 630 then a command-line option to @code{g77} should not be provided.
 631 (Though, given that it'll be easy to implement, it might be hard
 632 to resist user requests for it ``to compile faster than if we
 633 have to invoke another filter''.)
 634
 635 This program would take a command-line option to specify the
 636 default interpretation of slashes,
 637 affecting which prefix it uses for constants.
 638
 639 @code{g77slash} probably should automatically convert Hollerith
 640 constants that contain slashes
 641 to the appropriate @code{CHARACTER} constants.
 642 Then @code{g77} wouldn't have to define a prefix syntax for Hollerith
 643 constants specifying whether they want C-style or straight-through
 644 backslashes.
 645
 646 @item
 647 To allow for form-neutral INCLUDE files without requiring them
 648 to be preprocessed,
 649 the fixed-form lexer should offer an extension (if possible)
 650 allowing a trailing @samp{&} to be ignored, especially if after
 651 column 72, as it would be using the traditional Unix Fortran source
 652 model (which ignores @emph{everything} after column 72).
 653 @end itemize
 654
 655 The above implements nearly exactly what is specified by
 656 @ref{Character Set},
 657 and
 658 @ref{Lines},
 659 except it also provides automatic conversion of tabs
 660 and ignoring of newline-related carriage returns,
 661 as well as accommodating form-neutral INCLUDE files.
 662
 663 It also implements the ``pure visual'' model,
 664 by which is meant that a user viewing his code
 665 in a typical text editor
 666 (assuming it's not preprocessed via @code{g77stripcard} or similar)
 667 doesn't need any special knowledge
 668 of whether spaces on the screen are really tabs,
 669 whether lines end immediately after the last visible non-space character
 670 or after a number of spaces and tabs that follow it,
 671 or whether the last line in the file is ended by a newline.
 672
 673 Most editors don't make these distinctions,
 674 the ANSI FORTRAN 77 standard doesn't require them to,
 675 and it permits a standard-conforming compiler
 676 to define a method for transforming source code to
 677 ``standard form'' however it wants.
 678
 679 So, GNU Fortran defines it such that users have the best chance
 680 of having the code be interpreted the way it looks on the screen
 681 of the typical editor.
 682
 683 (Fancy editors should @emph{never} be required to correctly read code
 684 written in classic two-dimensional-plaintext form.
 685 By correct reading I mean ability to read it, book-like, without
 686 mistaking text ignored by the compiler for program code and vice versa,
 687 and without having to count beyond the first several columns.
 688 The vague meaning of ASCII TAB, among other things, complicates
 689 this somewhat, but as long as ``everyone'', including the editor,
 690 other tools, and printer, agrees about the every-eighth-column convention,
 691 the GNU Fortran ``pure visual'' model meets these requirements.
 692 Any language or user-visible source form
 693 requiring special tagging of tabs,
 694 the ends of lines after spaces/tabs,
 695 and so on, fails to meet this fairly straightforward specification.
 696 Fortunately, Fortran @emph{itself} does not mandate such a failure,
 697 though most vendor-supplied defaults for their Fortran compilers @emph{do}
 698 fail to meet this specification for readability.)
 699
 700 Further, this model provides a clean interface
 701 to whatever preprocessors or code-generators are used
 702 to produce input to this phase of @code{g77}.
 703 Mainly, they need not worry about long lines.
 704
 705 @node sta.c
 706 @subsection sta.c
 707
 708 @node sti.c
 709 @subsection sti.c
 710
 711 @node stq.c
 712 @subsection stq.c
 713
 714 @node stb.c
 715 @subsection stb.c
 716
 717 @node expr.c
 718 @subsection expr.c
 719
 720 @node stc.c
 721 @subsection stc.c
 722
 723 @node std.c
 724 @subsection std.c
 725
 726 @node ste.c
 727 @subsection ste.c
 728
 729 @node Gotchas (Transforming)
 730 @subsection Gotchas (Transforming)
 731
 732 This section is not about transforming ``gotchas'' into something else.
 733 It is about the weirder aspects of transforming Fortran,
 734 however that's defined,
 735 into a more modern, canonical form.
 736
 737 @subsubsection Multi-character Lexemes
 738
 739 Each lexeme carries with it a pointer to where it appears in the source.
 740
 741 To provide the ability for diagnostics to point to column numbers,
 742 in addition to line numbers and names,
 743 lexemes that represent more than one (significant) character
 744 in the source code need, generally,
 745 to provide pointers to where each @emph{character} appears in the source.
 746
 747 This provides the ability to properly identify the precise location
 748 of the problem in code like
 749
 750 @smallexample
 751 SUBROUTINE X
 752 END
 753 BLOCK DATA X
 754 END
 755 @end smallexample
 756
 757 which, in fixed-form source, would result in single lexemes
 758 consisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}.
 759 (The problem is that @samp{X} is defined twice,
 760 so a pointer to the @samp{X} in the second definition,
 761 as well as a follow-up pointer to the corresponding pointer in the first,
 762 would be preferable to pointing to the beginnings of the statements.)
 763
 764 This need also arises when parsing (and diagnosing) @code{FORMAT}
 765 statements.
 766
 767 Further, it arises when diagnosing
 768 @code{FMT=} specifiers that contain constants
 769 (or partial constants, or even propagated constants!)
 770 in I/O statements, as in:
 771
 772 @smallexample
 773 PRINT '(I2, 3HAB)', J
 774 @end smallexample
 775
 776 (A pointer to the beginning of the prematurely-terminated Hollerith
 777 constant, and/or to the close parenthese, is preferable to a pointer
 778 to the open-parenthese or the apostrophe that precedes it.)
 779
 780 Multi-character lexemes, which would seem to naturally include
 781 at least digit strings, alphanumeric strings, @code{CHARACTER}
 782 constants, and Hollerith constants, therefore need to provide
 783 location information on each character.
 784 (Maybe Hollerith constants don't, but it's unnecessary to except them.)
 785
 786 The question then arises, what about @emph{other} multi-character lexemes,
 787 such as @samp{**} and @samp{//},
 788 and Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on?
 789
 790 Turns out there's a need to identify the location of the second character
 791 of these two-character lexemes.
 792 For example, in @samp{I(/J) = K}, the slash needs to be diagnosed
 793 as the problem, not the open parenthese.
 794 Similarly, it is preferable to diagnose the second slash in
 795 @samp{I = J // K} rather than the first, given the implicit typing
 796 rules, which would result in the compiler disallowing the attempted
 797 concatenation of two integers.
 798 (Though, since that's more of a semantic issue,
 799 it's not @emph{that} much preferable.)
 800
 801 Even sequences that could be parsed as digit strings could use location info,
 802 for example, to diagnose the @samp{9} in the octal constant @samp{O'129'}.
 803 (This probably will be parsed as a character string,
 804 to be consistent with the parsing of @samp{Z'129A'}.)
 805
 806 To avoid the hassle of recording the location of the second character,
 807 while also preserving the general rule that each significant character
 808 is distinctly pointed to by the lexeme that contains it,
 809 it's best to simply not have any fixed-size lexemes
 810 larger than one character.
 811
 812 This new design is expected to make checking for two
 813 @samp{*} lexemes in a row much easier than the old design,
 814 so this is not much of a sacrifice.
 815 It probably makes the lexer much easier to implement
 816 than it makes the parser harder.
 817
 818 @subsubsection Space-padding Lexemes
 819
 820 Certain lexemes need to be padded with virtual spaces when the
 821 end of the line (or file) is encountered.
 822
 823 This is necessary in fixed form, to handle lines that don't
 824 extend to column 72, assuming that's the line length in effect.
 825
 826 @subsubsection Bizarre Free-form Hollerith Constants
 827
 828 Last I checked, the Fortran 90 standard actually required the compiler
 829 to silently accept something like
 830
 831 @smallexample
 832 FORMAT ( 1 2   Htwelve chars )
 833 @end smallexample
 834
 835 as a valid @code{FORMAT} statement specifying a twelve-character
 836 Hollerith constant.
 837
 838 The implication here is that, since the new lexer is a zero-feedback one,
 839 it won't know that the special case of a @code{FORMAT} statement being parsed
 840 requires apparently distinct lexemes @samp{1} and @samp{2} to be treated as
 841 a single lexeme.
 842
 843 (This is a horrible misfeature of the Fortran 90 language.
 844 It's one of many such misfeatures that almost make me want
 845 to not support them, and forge ahead with designing a new
 846 ``GNU Fortran'' language that has the features,
 847 but not the misfeatures, of Fortran 90,
 848 and provide utility programs to do the conversion automatically.)
 849
 850 So, the lexer must gather distinct chunks of decimal strings into
 851 a single lexeme in contexts where a single decimal lexeme might
 852 start a Hollerith constant.
 853
 854 (Which probably means it might as well do that all the time
 855 for all multi-character lexemes, even in free-form mode,
 856 leaving it to subsequent phases to pull them apart as they see fit.)
 857
 858 Compare the treatment of this to how
 859
 860 @smallexample
 861 CHARACTER * 4 5 HEY
 862 @end smallexample
 863
 864 and
 865
 866 @smallexample
 867 CHARACTER * 12 HEY
 868 @end smallexample
 869
 870 must be treated---the former must be diagnosed, due to the separation
 871 between lexemes, the latter must be accepted as a proper declaration.
 872
 873 @subsubsection Hollerith Constants
 874
 875 Recognizing a Hollerith constant---specifically,
 876 that an @samp{H} or @samp{h} after a digit string begins
 877 such a constant---requires some knowledge of context.
 878
 879 Hollerith constants (such as @samp{2HAB}) can appear after:
 880
 881 @itemize @bullet
 882 @item
 883 @samp{(}
 884
 885 @item
 886 @samp{,}
 887
 888 @item
 889 @samp{=}
 890
 891 @item
 892 @samp{+}, @samp{-}, @samp{/}
 893
 894 @item
 895 @samp{*}, except as noted below
 896 @end itemize
 897
 898 Hollerith constants don't appear after:
 899
 900 @itemize @bullet
 901 @item
 902 @samp{CHARACTER*},
 903 which can be treated generally as
 904 any @samp{*} that is the second lexeme of a statement
 905 @end itemize
 906
 907 @subsubsection Confusing Function Keyword
 908
 909 While
 910
 911 @smallexample
 912 REAL FUNCTION FOO ()
 913 @end smallexample
 914
 915 must be a @code{FUNCTION} statement and
 916
 917 @smallexample
 918 REAL FUNCTION FOO (5)
 919 @end smallexample
 920
 921 must be a type-definition statement,
 922
 923 @smallexample
 924 REAL FUNCTION FOO (@var{names})
 925 @end smallexample
 926
 927 where @var{names} is a comma-separated list of names,
 928 can be one or the other.
 929
 930 The only way to disambiguate that statement
 931 (short of mandating free-form source or a short maximum
 932 length for name for external procedures)
 933 is based on the context of the statement.
 934
 935 In particular, the statement is known to be within an
 936 already-started program unit
 937 (but not at the outer level of the @code{CONTAINS} block),
 938 it is a type-declaration statement.
 939
 940 Otherwise, the statement is a @code{FUNCTION} statement,
 941 in that it begins a function program unit
 942 (external, or, within @code{CONTAINS}, nested).
 943
 944 @subsubsection Weird READ
 945
 946 The statement
 947
 948 @smallexample
 949 READ (N)
 950 @end smallexample
 951
 952 is equivalent to either
 953
 954 @smallexample
 955 READ (UNIT=(N))
 956 @end smallexample
 957
 958 or
 959
 960 @smallexample
 961 READ (FMT=(N))
 962 @end smallexample
 963
 964 depending on which would be valid in context.
 965
 966 Specifically, if @samp{N} is type @code{INTEGER},
 967 @samp{READ (FMT=(N))} would not be valid,
 968 because parentheses may not be used around @samp{N},
 969 whereas they may around it in @samp{READ (UNIT=(N))}.
 970
 971 Further, if @samp{N} is type @code{CHARACTER},
 972 the opposite is true---@samp{READ (UNIT=(N))} is not valid,
 973 but @samp{READ (FMT=(N))} is.
 974
 975 Strictly speaking, if anything follows
 976
 977 @smallexample
 978 READ (N)
 979 @end smallexample
 980
 981 in the statement, whether the first lexeme after the close
 982 parenthese is a comma could be used to disambiguate the two cases,
 983 without looking at the type of @samp{N},
 984 because the comma is required for the @samp{READ (FMT=(N))}
 985 interpretation and disallowed for the @samp{READ (UNIT=(N))}
 986 interpretation.
 987
 988 However, in practice, many Fortran compilers allow
 989 the comma for the @samp{READ (UNIT=(N))}
 990 interpretation anyway
 991 (in that they generally allow a leading comma before
 992 an I/O list in an I/O statement),
 993 and much code takes advantage of this allowance.
 994
 995 (This is quite a reasonable allowance, since the
 996 juxtaposition of a comma-separated list immediately
 997 after an I/O control-specification list, which is also comma-separated,
 998 without an intervening comma,
 999 looks sufficiently ``wrong'' to programmers
1000 that they can't resist the itch to insert the comma.
1001 @samp{READ (I, J), K, L} simply looks cleaner than
1002 @samp{READ (I, J) K, L}.)
1003
1004 So, type-based disambiguation is needed unless strict adherence
1005 to the standard is always assumed, and we're not going to assume that.
1006
1007 @node TBD (Transforming)
1008 @subsection TBD (Transforming)
1009
1010 Continue researching gotchas, designing the transformational process,
1011 and implementing it.
1012
1013 Specific issues to resolve:
1014
1015 @itemize @bullet
1016 @item
1017 Just where should (if it was implemented) @code{USE} processing take place?
1018
1019 This gets into the whole issue of how @code{g77} should handle the concept
1020 of modules.
1021 I think GNAT already takes on this issue, but don't know more than that.
1022 Jim Giles has written extensively on @code{comp.lang.fortran}
1023 about his opinions on module handling, as have others.
1024 Jim's views should be taken into account.
1025
1026 Actually, Richard M. Stallman (RMS) also has written up
1027 some guidelines for implementing such things,
1028 but I'm not sure where I read them.
1029 Perhaps the old @email{gcc2@@cygnus.com} list.
1030
1031 If someone could dig references to these up and get them to me,
1032 that would be much appreciated!
1033 Even though modules are not on the short-term list for implementation,
1034 it'd be helpful to know @emph{now} how to avoid making them harder to
1035 implement them @emph{later}.
1036
1037 @item
1038 Should the @code{g77} command become just a script that invokes
1039 all the various preprocessing that might be needed,
1040 thus making it seem slower than necessary for legacy code
1041 that people are unwilling to convert,
1042 or should we provide a separate script for that,
1043 thus encouraging people to convert their code once and for all?
1044
1045 At least, a separate script to behave as old @code{g77} did,
1046 perhaps named @code{g77old}, might ease the transition,
1047 as might a corresponding one that converts source codes
1048 named @code{g77oldnew}.
1049
1050 These scripts would take all the pertinent options @code{g77} used
1051 to take and run the appropriate filters,
1052 passing the results to @code{g77} or just making new sources out of them
1053 (in a subdirectory, leaving the user to do the dirty deed of
1054 moving or copying them over the old sources).
1055
1056 @item
1057 Do other Fortran compilers provide a prefix syntax
1058 to govern the treatment of backslashes in @code{CHARACTER}
1059 (or Hollerith) constants?
1060
1061 Knowing what other compilers provide would help.
1062
1063 @item
1064 Is it okay to drop support for the @samp{-fintrin-case-initcap},
1065 @samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap},
1066 and @samp{-fcase-initcap} options?
1067
1068 I've asked @email{info-gnu-fortran@@gnu.org} for input on this.
1069 Not having to support these makes it easier to write the new front end,
1070 and might also avoid complicated its design.
1071
1072 The consensus to date (1999-11-17) has been to drop this support.
1073 Can't recall anybody saying they're using it, in fact.
1074 @end itemize
1075
1076 @node Philosophy of Code Generation
1077 @section Philosophy of Code Generation
1078
1079 Don't poke the bear.
1080
1081 The @code{g77} front end generates code
1082 via the @code{gcc} back end.
1083
1084 @cindex GNU Back End (GBE)
1085 @cindex GBE
1086 @cindex @code{gcc}, back end
1087 @cindex back end, gcc
1088 @cindex code generator
1089 The @code{gcc} back end (GBE) is a large, complex
1090 labyrinth of intricate code
1091 written in a combination of the C language
1092 and specialized languages internal to @code{gcc}.
1093
1094 While the @emph{code} that implements the GBE
1095 is written in a combination of languages,
1096 the GBE itself is,
1097 to the front end for a language like Fortran,
1098 best viewed as a @emph{compiler}
1099 that compiles its own, unique, language.
1100
1101 The GBE's ``source'', then, is written in this language,
1102 which consists primarily of
1103 a combination of calls to GBE functions
1104 and @dfn{tree} nodes
1105 (which are, themselves, created
1106 by calling GBE functions).
1107
1108 So, the @code{g77} generates code by, in effect,
1109 translating the Fortran code it reads
1110 into a form ``written'' in the ``language''
1111 of the @code{gcc} back end.
1112
1113 @cindex GBEL
1114 @cindex GNU Back End Language (GBEL)
1115 This language will heretofore be referred to as @dfn{GBEL},
1116 for GNU Back End Language.
1117
1118 GBEL is an evolving language,
1119 not fully specified in any published form
1120 as of this writing.
1121 It offers many facilities,
1122 but its ``core'' facilities
1123 are those that corresponding most directly
1124 to those needed to support @code{gcc}
1125 (compiling code written in GNU C).
1126
1127 The @code{g77} Fortran Front End (FFE)
1128 is designed and implemented
1129 to navigate the currents and eddies
1130 of ongoing GBEL and @code{gcc} development
1131 while also delivering on the potential
1132 of an integrated FFE
1133 (as compared to using a converter like @code{f2c}
1134 and feeding the output into @code{gcc}).
1135
1136 Goals of the FFE's code-generation strategy include:
1137
1138 @itemize @bullet
1139 @item
1140 High likelihood of generation of correct code,
1141 or, failing that, producing a fatal diagnostic or crashing.
1142
1143 @item
1144 Generation of highly optimized code,
1145 as directed by the user
1146 via GBE-specific (versus @code{g77}-specific) constructs,
1147 such as command-line options.
1148
1149 @item
1150 Fast overall (FFE plus GBE) compilation.
1151
1152 @item
1153 Preservation of source-level debugging information.
1154 @end itemize
1155
1156 The strategies historically, and currently, used by the FFE
1157 to achieve these goals include:
1158
1159 @itemize @bullet
1160 @item
1161 Use of GBEL constructs that most faithfully encapsulate
1162 the semantics of Fortran.
1163
1164 @item
1165 Avoidance of GBEL constructs that are so rarely used,
1166 or limited to use in specialized situations not related to Fortran,
1167 that their reliability and performance has not yet been established
1168 as sufficient for use by the FFE.
1169
1170 @item
1171 Flexible design, to readily accommodate changes to specific
1172 code-generation strategies, perhaps governed by command-line options.
1173 @end itemize
1174
1175 @cindex Bear-poking
1176 @cindex Poking the bear
1177 ``Don't poke the bear'' somewhat summarizes the above strategies.
1178 The GBE is the bear.
1179 The FFE is designed and implemented to avoid poking it
1180 in ways that are likely to just annoy it.
1181 The FFE usually either tackles it head-on,
1182 or avoids treating it in ways dissimilar to how
1183 the @code{gcc} front end treats it.
1184
1185 For example, the FFE uses the native array facility in the back end
1186 instead of the lower-level pointer-arithmetic facility
1187 used by @code{gcc} when compiling @code{f2c} output).
1188 Theoretically, this presents more opportunities for optimization,
1189 faster compile times,
1190 and the production of more faithful debugging information.
1191 These benefits were not, however, immediately realized,
1192 mainly because @code{gcc} itself makes little or no use
1193 of the native array facility.
1194
1195 Complex arithmetic is a case study of the evolution of this strategy.
1196 When originally implemented,
1197 the GBEL had just evolved its own native complex-arithmetic facility,
1198 so the FFE took advantage of that.
1199
1200 When porting @code{g77} to 64-bit systems,
1201 it was discovered that the GBE didn't really
1202 implement its native complex-arithmetic facility properly.
1203
1204 The short-term solution was to rewrite the FFE
1205 to instead use the lower-level facilities
1206 that'd be used by @code{gcc}-compiled code
1207 (assuming that code, itself, didn't use the native complex type
1208 provided, as an extension, by @code{gcc}),
1209 since these were known to work,
1210 and, in any case, if shown to not work,
1211 would likely be rapidly fixed
1212 (since they'd likely not work for vanilla C code in similar circumstances).
1213
1214 However, the rewrite accommodated the original, native approach as well
1215 by offering a command-line option to select it over the emulated approach.
1216 This allowed users, and especially GBE maintainers, to try out
1217 fixes to complex-arithmetic support in the GBE
1218 while @code{g77} continued to default to compiling more code correctly,
1219 albeit producing (typically) slower executables.
1220
1221 As of April 1999, it appeared that the last few bugs
1222 in the GBE's support of its native complex-arithmetic facility
1223 were worked out.
1224 The FFE was changed back to default to using that native facility,
1225 leaving emulation as an option.
1226
1227 Later during the release cycle
1228 (which was called EGCS 1.2, but soon became GCC 2.95),
1229 bugs in the native facility were found.
1230 Reactions among various people included
1231 ``the last thing we should do is change the default back'',
1232 ``we must change the default back'',
1233 and ``let's figure out whether we can narrow down the bugs to
1234 few enough cases to allow the now-months-long-tested default
1235 to remain the same''.
1236 The latter viewpoint won that particular time.
1237 The bugs exposed other concerns regarding ABI compliance
1238 when the ABI specified treatment of complex data as different
1239 from treatment of what Fortran and GNU C consider the equivalent
1240 aggregation (structure) of real (or float) pairs.
1241
1242 Other Fortran constructs---arrays, character strings,
1243 complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates,
1244 and so on---involve issues similar to those pertaining to complex arithmetic.
1245
1246 So, it is possible that the history
1247 of how the FFE handled complex arithmetic
1248 will be repeated, probably in modified form
1249 (and hopefully over shorter timeframes),
1250 for some of these other facilities.
1251
1252 @node Two-pass Design
1253 @section Two-pass Design
1254
1255 The FFE does not tell the GBE anything about a program unit
1256 until after the last statement in that unit has been parsed.
1257 (A program unit is a Fortran concept that corresponds, in the C world,
1258 mostly closely to functions definitions in ISO C.
1259 That is, a program unit in Fortran is like a top-level function in C.
1260 Nested functions, found among the extensions offered by GNU C,
1261 correspond roughly to Fortran's statement functions.)
1262
1263 So, while parsing the code in a program unit,
1264 the FFE saves up all the information
1265 on statements, expressions, names, and so on,
1266 until it has seen the last statement.
1267
1268 At that point, the FFE revisits the saved information
1269 (in what amounts to a second @dfn{pass} over the program unit)
1270 to perform the actual translation of the program unit into GBEL,
1271 ultimating in the generation of assembly code for it.
1272
1273 Some lookahead is performed during this second pass,
1274 so the FFE could be viewed as a ``two-plus-pass'' design.
1275
1276 @menu
1277 * Two-pass Code::
1278 * Why Two Passes::
1279 @end menu
1280
1281 @node Two-pass Code
1282 @subsection Two-pass Code
1283
1284 Most of the code that turns the first pass (parsing)
1285 into a second pass for code generation
1286 is in @file{@value{path-g77}/std.c}.
1287
1288 It has external functions,
1289 called mainly by siblings in @file{@value{path-g77}/stc.c},
1290 that record the information on statements and expressions
1291 in the order they are seen in the source code.
1292 These functions save that information.
1293
1294 It also has an external function that revisits that information,
1295 calling the siblings in @file{@value{path-g77}/ste.c},
1296 which handles the actual code generation
1297 (by generating GBEL code,
1298 that is, by calling GBE routines
1299 to represent and specify expressions, statements, and so on).
1300
1301 @node Why Two Passes
1302 @subsection Why Two Passes
1303
1304 The need for two passes was not immediately evident
1305 during the design and implementation of the code in the FFE
1306 that was to produce GBEL.
1307 Only after a few kludges,
1308 to handle things like incorrectly-guessed @code{ASSIGN} label nature,
1309 had been implemented,
1310 did enough evidence pile up to make it clear
1311 that @file{std.c} had to be introduced to intercept,
1312 save, then revisit as part of a second pass,
1313 the digested contents of a program unit.
1314
1315 Other such missteps have occurred during the evolution of the FFE,
1316 because of the different goals of the FFE and the GBE.
1317
1318 Because the GBE's original, and still primary, goal
1319 was to directly support the GNU C language,
1320 the GBEL, and the GBE itself,
1321 requires more complexity
1322 on the part of most front ends
1323 than it requires of @code{gcc}'s.
1324
1325 For example,
1326 the GBEL offers an interface that permits the @code{gcc} front end
1327 to implement most, or all, of the language features it supports,
1328 without the front end having to
1329 make use of non-user-defined variables.
1330 (It's almost certainly the case that all of K&R C,
1331 and probably ANSI C as well,
1332 is handled by the @code{gcc} front end
1333 without declaring such variables.)
1334
1335 The FFE, on the other hand, must resort to a variety of ``tricks''
1336 to achieve its goals.
1337
1338 Consider the following C code:
1339
1340 @smallexample
1341 int
1342 foo (int a, int b)
1343 @{
1344   int c = 0;
1345
1346   if ((c = bar (c)) == 0)
1347     goto done;
1348
1349   quux (c << 1);
1350
1351 done:
1352   return c;
1353 @}
1354 @end smallexample
1355
1356 Note what kinds of objects are declared, or defined, before their use,
1357 and before any actual code generation involving them
1358 would normally take place:
1359
1360 @itemize @bullet
1361 @item
1362 Return type of function
1363
1364 @item
1365 Entry point(s) of function
1366
1367 @item
1368 Dummy arguments
1369
1370 @item
1371 Variables
1372
1373 @item
1374 Initial values for variables
1375 @end itemize
1376
1377 Whereas, the following items can, and do,
1378 suddenly appear ``out of the blue'' in C:
1379
1380 @itemize @bullet
1381 @item
1382 Label references
1383
1384 @item
1385 Function references
1386 @end itemize
1387
1388 Not surprisingly, the GBE faithfully permits the latter set of items
1389 to be ``discovered'' partway through GBEL ``programs'',
1390 just as they are permitted to in C.
1391
1392 Yet, the GBE has tended, at least in the past,
1393 to be reticent to fully support similar ``late'' discovery
1394 of items in the former set.
1395
1396 This makes Fortran a poor fit for the ``safe'' subset of GBEL.
1397 Consider:
1398
1399 @smallexample
1400       FUNCTION X (A, ARRAY, ID1)
1401       CHARACTER*(*) A
1402       DOUBLE PRECISION X, Y, Z, TMP, EE, PI
1403       REAL ARRAY(ID1*ID2)
1404       COMMON ID2
1405       EXTERNAL FRED
1406
1407       ASSIGN 100 TO J
1408       CALL FOO (I)
1409       IF (I .EQ. 0) PRINT *, A(0)
1410       GOTO 200
1411
1412       ENTRY Y (Z)
1413       ASSIGN 101 TO J
1414 200   PRINT *, A(1)
1415       READ *, TMP
1416       GOTO J
1417 100   X = TMP * EE
1418       RETURN
1419 101   Y = TMP * PI
1420       CALL FRED
1421       DATA EE, PI /2.71D0, 3.14D0/
1422       END
1423 @end smallexample
1424
1425 Here are some observations about the above code,
1426 which, while somewhat contrived,
1427 conforms to the FORTRAN 77 and Fortran 90 standards:
1428
1429 @itemize @bullet
1430 @item
1431 The return type of function @samp{X} is not known
1432 until the @samp{DOUBLE PRECISION} line has been parsed.
1433
1434 @item
1435 Whether @samp{A} is a function or a variable
1436 is not known until the @samp{PRINT *, A(0)} statement
1437 has been parsed.
1438
1439 @item
1440 The bounds of the array of argument @samp{ARRAY}
1441 depend on a computation involving
1442 the subsequent argument @samp{ID1}
1443 and the blank-common member @samp{ID2}.
1444
1445 @item
1446 Whether @samp{Y} and @samp{Z} are local variables,
1447 additional function entry points,
1448 or dummy arguments to additional entry points
1449 is not known
1450 until the @code{ENTRY} statement is parsed.
1451
1452 @item
1453 Similarly, whether @samp{TMP} is a local variable is not known
1454 until the @samp{READ *, TMP} statement is parsed.
1455
1456 @item
1457 The initial values for @samp{EE} and @samp{PI}
1458 are not known until after the @code{DATA} statement is parsed.
1459
1460 @item
1461 Whether @samp{FRED} is a function returning type @code{REAL}
1462 or a subroutine
1463 (which can be thought of as returning type @code{void}
1464 @emph{or}, to support alternate returns in a simple way,
1465 type @code{int})
1466 is not known
1467 until the @samp{CALL FRED} statement is parsed.
1468
1469 @item
1470 Whether @samp{100} is a @code{FORMAT} label
1471 or the label of an executable statement
1472 is not known
1473 until the @samp{X =} statement is parsed.
1474 (These two types of labels get @emph{very} different treatment,
1475 especially when @code{ASSIGN}'ed.)
1476
1477 @item
1478 That @samp{J} is a local variable is not known
1479 until the first @code{ASSIGN} statement is parsed.
1480 (This happens @emph{after} executable code has been seen.)
1481 @end itemize
1482
1483 Very few of these ``discoveries''
1484 can be accommodated by the GBE as it has evolved over the years.
1485 The GBEL doesn't support several of them,
1486 and those it might appear to support
1487 don't always work properly,
1488 especially in combination with other GBEL and GBE features,
1489 as implemented in the GBE.
1490
1491 (Had the GBE and its GBEL originally evolved to support @code{g77},
1492 the shoe would be on the other foot, so to speak---most, if not all,
1493 of the above would be directly supported by the GBEL,
1494 and a few C constructs would probably not, as they are in reality,
1495 be supported.
1496 Both this mythical, and today's real, GBE caters to its GBEL
1497 by, sometimes, scrambling around, cleaning up after itself---after
1498 discovering that assumptions it made earlier during code generation
1499 are incorrect.
1500 That's not a great design, since it indicates significant code
1501 paths that might be rarely tested but used in some key production
1502 environments.)
1503
1504 So, the FFE handles these discrepancies---between the order in which
1505 it discovers facts about the code it is compiling,
1506 and the order in which the GBEL and GBE support such discoveries---by
1507 performing what amounts to two
1508 passes over each program unit.
1509
1510 (A few ambiguities can remain at that point,
1511 such as whether, given @samp{EXTERNAL BAZ}
1512 and no other reference to @samp{BAZ} in the program unit,
1513 it is a subroutine, a function, or a block-data---which, in C-speak,
1514 governs its declared return type.
1515 Fortunately, these distinctions are easily finessed
1516 for the procedure, library, and object-file interfaces
1517 supported by @code{g77}.)
1518
1519 @node Challenges Posed
1520 @section Challenges Posed
1521
1522 Consider the following Fortran code, which uses various extensions
1523 (including some to Fortran 90):
1524
1525 @smallexample
1526 SUBROUTINE X(A)
1527 CHARACTER*(*) A
1528 COMPLEX CFUNC
1529 INTEGER*2 CLOCKS(200)
1530 INTEGER IFUNC
1531
1532 CALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')'))))
1533 @end smallexample
1534
1535 The above poses the following challenges to any Fortran compiler
1536 that uses run-time interfaces, and a run-time library, roughly similar
1537 to those used by @code{g77}:
1538
1539 @itemize @bullet
1540 @item
1541 Assuming the library routine that supports @code{SYSTEM_CLOCK}
1542 expects to set an @code{INTEGER*4} variable via its @code{COUNT} argument,
1543 the compiler must make available to it a temporary variable of that type.
1544
1545 @item
1546 Further, after the @code{SYSTEM_CLOCK} library routine returns,
1547 the compiler must ensure that the temporary variable it wrote
1548 is copied into the appropriate element of the @samp{CLOCKS} array.
1549 (This assumes the compiler doesn't just reject the code,
1550 which it should if it is compiling under some kind of a ``strict'' option.)
1551
1552 @item
1553 To determine the correct index into the @samp{CLOCKS} array,
1554 (putting aside the fact that the index, in this particular case,
1555 need not be computed until after
1556 the @code{SYSTEM_CLOCK} library routine returns),
1557 the compiler must ensure that the @code{IFUNC} function is called.
1558
1559 That requires evaluating its argument,
1560 which requires, for @code{g77}
1561 (assuming @code{-ff2c} is in force),
1562 reserving a temporary variable of type @code{COMPLEX}
1563 for use as a repository for the return value
1564 being computed by @samp{CFUNC}.
1565
1566 @item
1567 Before invoking @samp{CFUNC},
1568 is argument must be evaluated,
1569 which requires allocating, at run time,
1570 a temporary large enough to hold the result of the concatenation,
1571 as well as actually performing the concatenation.
1572
1573 @item
1574 The large temporary needed during invocation of @code{CFUNC}
1575 should, ideally, be deallocated
1576 (or, at least, left to the GBE to dispose of, as it sees fit)
1577 as soon as @code{CFUNC} returns,
1578 which means before @code{IFUNC} is called
1579 (as it might need a lot of dynamically allocated memory).
1580 @end itemize
1581
1582 @code{g77} currently doesn't support all of the above,
1583 but, so that it might someday, it has evolved to handle
1584 at least some of the above requirements.
1585
1586 Meeting the above requirements is made more challenging
1587 by conforming to the requirements of the GBEL/GBE combination.
1588
1589 @node Transforming Statements
1590 @section Transforming Statements
1591
1592 Most Fortran statements are given their own block,
1593 and, for temporary variables they might need, their own scope.
1594 (A block is what distinguishes @samp{@{ foo (); @}}
1595 from just @samp{foo ();} in C.
1596 A scope is included with every such block,
1597 providing a distinct name space for local variables.)
1598
1599 Label definitions for the statement precede this block,
1600 so @samp{10 PRINT *, I} is handled more like
1601 @samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}}
1602 (where @samp{fl10} is just a notation meaning ``Fortran Label 10''
1603 for the purposes of this document).
1604
1605 @menu
1606 * Statements Needing Temporaries::
1607 * Transforming DO WHILE::
1608 * Transforming Iterative DO::
1609 * Transforming Block IF::
1610 * Transforming SELECT CASE::
1611 @end menu
1612
1613 @node Statements Needing Temporaries
1614 @subsection Statements Needing Temporaries
1615
1616 Any temporaries needed during, but not beyond,
1617 execution of a Fortran statement,
1618 are made local to the scope of that statement's block.
1619
1620 This allows the GBE to share storage for these temporaries
1621 among the various statements without the FFE
1622 having to manage that itself.
1623
1624 (The GBE could, of course, decide to optimize
1625 management of these temporaries.
1626 For example, it could, theoretically,
1627 schedule some of the computations involving these temporaries
1628 to occur in parallel.
1629 More practically, it might leave the storage for some temporaries
1630 ``live'' beyond their scopes, to reduce the number of
1631 manipulations of the stack pointer at run time.)
1632
1633 Temporaries needed across distinct statement boundaries usually
1634 are associated with Fortran blocks (such as @code{DO}/@code{END DO}).
1635 (Also, there might be temporaries not associated with blocks at all---these
1636 would be in the scope of the entire program unit.)
1637
1638 Each Fortran block @emph{should} get its own block/scope in the GBE.
1639 This is best, because it allows temporaries to be more naturally handled.
1640 However, it might pose problems when handling labels
1641 (in particular, when they're the targets of @code{GOTO}s outside the Fortran
1642 block), and generally just hassling with replicating
1643 parts of the @code{gcc} front end
1644 (because the FFE needs to support
1645 an arbitrary number of nested back-end blocks
1646 if each Fortran block gets one).
1647
1648 So, there might still be a need for top-level temporaries, whose
1649 ``owning'' scope is that of the containing procedure.
1650
1651 Also, there seems to be problems declaring new variables after
1652 generating code (within a block) in the back end, leading to, e.g.,
1653 @samp{label not defined before binding contour} or similar messages,
1654 when compiling with @samp{-fstack-check} or
1655 when compiling for certain targets.
1656
1657 Because of that, and because sometimes these temporaries are not
1658 discovered until in the middle of of generating code for an expression
1659 statement (as in the case of the optimization for @samp{X**I}),
1660 it seems best to always
1661 pre-scan all the expressions that'll be expanded for a block
1662 before generating any of the code for that block.
1663
1664 This pre-scan then handles discovering and declaring, to the back end,
1665 the temporaries needed for that block.
1666
1667 It's also important to treat distinct items in an I/O list as distinct
1668 statements deserving their own blocks.
1669 That's because there's a requirement
1670 that each I/O item be fully processed before the next one,
1671 which matters in cases like @samp{READ (*,*), I, A(I)}---the
1672 element of @samp{A} read in the second item
1673 @emph{must} be determined from the value
1674 of @samp{I} read in the first item.
1675
1676 @node Transforming DO WHILE
1677 @subsection Transforming DO WHILE
1678
1679 @samp{DO WHILE(expr)} @emph{must} be implemented
1680 so that temporaries needed to evaluate @samp{expr}
1681 are generated just for the test, each time.
1682
1683 Consider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed:
1684
1685 @smallexample
1686 for (;;)
1687   @{
1688     int temp0;
1689
1690     @{
1691       char temp1[large];
1692
1693       libg77_catenate (temp1, a, b);
1694       temp0 = libg77_ne (temp1, 'END');
1695     @}
1696
1697     if (! temp0)
1698       break;
1699
1700     @dots{}
1701   @}
1702 @end smallexample
1703
1704 In this case, it seems like a time/space tradeoff
1705 between allocating and deallocating @samp{temp1} for each iteration
1706 and allocating it just once for the entire loop.
1707
1708 However, if @samp{temp1} is allocated just once for the entire loop,
1709 it could be the wrong size for subsequent iterations of that loop
1710 in cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')},
1711 because the body of the loop might modify @samp{I} or @samp{J}.
1712
1713 So, the above implementation is used,
1714 though a more optimal one can be used
1715 in specific circumstances.
1716
1717 @node Transforming Iterative DO
1718 @subsection Transforming Iterative DO
1719
1720 An iterative @code{DO} loop
1721 (one that specifies an iteration variable)
1722 is required by the Fortran standards
1723 to be implemented as though an iteration count
1724 is computed before entering the loop body,
1725 and that iteration count used to determine
1726 the number of times the loop body is to be performed
1727 (assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}).
1728
1729 The FFE handles this by allocating a temporary variable
1730 to contain the computed number of iterations.
1731 Since this variable must be in a scope that includes the entire loop,
1732 a GBEL block is created for that loop,
1733 and the variable declared as belonging to the scope of that block.
1734
1735 @node Transforming Block IF
1736 @subsection Transforming Block IF
1737
1738 Consider:
1739
1740 @smallexample
1741 SUBROUTINE X(A,B,C)
1742 CHARACTER*(*) A, B, C
1743 LOGICAL LFUNC
1744
1745 IF (LFUNC (A//B)) THEN
1746   CALL SUBR1
1747 ELSE IF (LFUNC (A//C)) THEN
1748   CALL SUBR2
1749 ELSE
1750   CALL SUBR3
1751 END
1752 @end smallexample
1753
1754 The arguments to the two calls to @samp{LFUNC}
1755 require dynamic allocation (at run time),
1756 but are not required during execution of the @code{CALL} statements.
1757
1758 So, the scopes of those temporaries must be within blocks inside
1759 the block corresponding to the Fortran @code{IF} block.
1760
1761 This cannot be represented ``naturally''
1762 in vanilla C, nor in GBEL.
1763 The @code{if}, @code{elseif}, @code{else},
1764 and @code{endif} constructs
1765 provided by both languages must,
1766 for a given @code{if} block,
1767 share the same C/GBE block.
1768
1769 Therefore, any temporaries needed during evaluation of @samp{expr}
1770 while executing @samp{ELSE IF(expr)}
1771 must either have been predeclared
1772 at the top of the corresponding @code{IF} block,
1773 or declared within a new block for that @code{ELSE IF}---a block that,
1774 since it cannot contain the @code{else} or @code{else if} itself
1775 (due to the above requirement),
1776 actually implements the rest of the @code{IF} block's
1777 @code{ELSE IF} and @code{ELSE} statements
1778 within an inner block.
1779
1780 The FFE takes the latter approach.
1781
1782 @node Transforming SELECT CASE
1783 @subsection Transforming SELECT CASE
1784
1785 @code{SELECT CASE} poses a few interesting problems for code generation,
1786 if efficiency and frugal stack management are important.
1787
1788 Consider @samp{SELECT CASE (I('PREFIX'//A))},
1789 where @samp{A} is @code{CHARACTER*(*)}.
1790 In a case like this---basically,
1791 in any case where largish temporaries are needed
1792 to evaluate the expression---those temporaries should
1793 not be ``live'' during execution of any of the @code{CASE} blocks.
1794
1795 So, evaluation of the expression is best done within its own block,
1796 which in turn is within the @code{SELECT CASE} block itself
1797 (which contains the code for the CASE blocks as well,
1798 though each within their own block).
1799
1800 Otherwise, we'd have the rough equivalent of this pseudo-code:
1801
1802 @smallexample
1803 @{
1804   char temp[large];
1805
1806   libg77_catenate (temp, 'prefix', a);
1807
1808   switch (i (temp))
1809     @{
1810     case 0:
1811       @dots{}
1812     @}
1813 @}
1814 @end smallexample
1815
1816 And that would leave temp[large] in scope during the CASE blocks
1817 (although a clever back end *could* see that it isn't referenced
1818 in them, and thus free that temp before executing the blocks).
1819
1820 So this approach is used instead:
1821
1822 @smallexample
1823 @{
1824   int temp0;
1825
1826   @{
1827     char temp1[large];
1828
1829     libg77_catenate (temp1, 'prefix', a);
1830     temp0 = i (temp1);
1831   @}
1832
1833   switch (temp0)
1834     @{
1835     case 0:
1836       @dots{}
1837     @}
1838 @}
1839 @end smallexample
1840
1841 Note how @samp{temp1} goes out of scope before starting the switch,
1842 thus making it easy for a back end to free it.
1843
1844 The problem @emph{that} solution has, however,
1845 is with @samp{SELECT CASE('prefix'//A)}
1846 (which is currently not supported).
1847
1848 Unless the GBEL is extended to support arbitrarily long character strings
1849 in its @code{case} facility,
1850 the FFE has to implement @code{SELECT CASE} on @code{CHARACTER}
1851 (probably excepting @code{CHARACTER*1})
1852 using a cascade of
1853 @code{if}, @code{elseif}, @code{else}, and @code{endif} constructs
1854 in GBEL.
1855
1856 To prevent the (potentially large) temporary,
1857 needed to hold the selected expression itself (@samp{'prefix'//A}),
1858 from being in scope during execution of the @code{CASE} blocks,
1859 two approaches are available:
1860
1861 @itemize @bullet
1862 @item
1863 Pre-evaluate all the @code{CASE} tests,
1864 producing an integer ordinal that is used,
1865 a la @samp{temp0} in the earlier example,
1866 as if @samp{SELECT CASE(temp0)} had been written.
1867
1868 Each corresponding @code{CASE} is replaced with @samp{CASE(@var{i})},
1869 where @var{i} is the ordinal for that case,
1870 determined while, or before,
1871 generating the cascade of @code{if}-related constructs
1872 to cope with @code{CHARACTER} selection.
1873
1874 @item
1875 Make @samp{temp0} above just
1876 large enough to hold the longest @code{CASE} string
1877 that'll actually be compared against the expression
1878 (in this case, @samp{'prefix'//A}).
1879
1880 Since that length must be constant
1881 (because @code{CASE} expressions are all constant),
1882 it won't be so large,
1883 and, further, @samp{temp1} need not be dynamically allocated,
1884 since normal @code{CHARACTER} assignment can be used
1885 into the fixed-length @samp{temp0}.
1886 @end itemize
1887
1888 Both of these solutions require @code{SELECT CASE} implementation
1889 to be changed so all the corresponding @code{CASE} statements
1890 are seen during the actual code generation for @code{SELECT CASE}.
1891
1892 @node Transforming Expressions
1893 @section Transforming Expressions
1894
1895 The interactions between statements, expressions, and subexpressions
1896 at program run time can be viewed as:
1897
1898 @smallexample
1899 @var{action}(@var{expr})
1900 @end smallexample
1901
1902 Here, @var{action} is the series of steps
1903 performed to effect the statement,
1904 and @var{expr} is the expression
1905 whose value is used by @var{action}.
1906
1907 Expanding the above shows a typical order of events at run time:
1908
1909 @smallexample
1910 Evaluate @var{expr}
1911 Perform @var{action}, using result of evaluation of @var{expr}
1912 Clean up after evaluating @var{expr}
1913 @end smallexample
1914
1915 So, if evaluating @var{expr} requires allocating memory,
1916 that memory can be freed before performing @var{action}
1917 only if it is not needed to hold the result of evaluating @var{expr}.
1918 Otherwise, it must be freed no sooner than
1919 after @var{action} has been performed.
1920
1921 The above are recursive definitions,
1922 in the sense that they apply to subexpressions of @var{expr}.
1923
1924 That is, evaluating @var{expr} involves
1925 evaluating all of its subexpressions,
1926 performing the @var{action} that computes the
1927 result value of @var{expr},
1928 then cleaning up after evaluating those subexpressions.
1929
1930 The recursive nature of this evaluation is implemented
1931 via recursive-descent transformation of the top-level statements,
1932 their expressions, @emph{their} subexpressions, and so on.
1933
1934 However, that recursive-descent transformation is,
1935 due to the nature of the GBEL,
1936 focused primarily on generating a @emph{single} stream of code
1937 to be executed at run time.
1938
1939 Yet, from the above, it's clear that multiple streams of code
1940 must effectively be simultaneously generated
1941 during the recursive-descent analysis of statements.
1942
1943 The primary stream implements the primary @var{action} items,
1944 while at least two other streams implement
1945 the evaluation and clean-up items.
1946
1947 Requirements imposed by expressions include:
1948
1949 @itemize @bullet
1950 @item
1951 Whether the caller needs to have a temporary ready
1952 to hold the value of the expression.
1953
1954 @item
1955 Other stuff???
1956 @end itemize
1957
1958 @node Internal Naming Conventions
1959 @section Internal Naming Conventions
1960
1961 Names exported by FFE modules have the following (regular-expression) forms.
1962 Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}},
1963 where @var{mod} is lowercase or uppercase alphanumerics, respectively,
1964 are exported by the module @code{ffe@var{mod}},
1965 with the source code doing the exporting in @file{@var{mod}.h}.
1966 (Usually, the source code for the implementation is in @file{@var{mod}.c}.)
1967
1968 Identifiers that don't fit the following forms
1969 are not considered exported,
1970 even if they are according to the C language.
1971 (For example, they might be made available to other modules
1972 solely for use within expansions of exported macros,
1973 not for use within any source code in those other modules.)
1974
1975 @table @code
1976 @item ffe@var{mod}
1977 The single typedef exported by the module.
1978
1979 @item FFE@var{umod}_[A-Z][A-Z0-9_]*
1980 (Where @var{umod} is the uppercase for of @var{mod}.)
1981
1982 A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}.
1983
1984 @item ffe@var{mod}[A-Z][A-Z][a-z0-9]*
1985 A typedef exported by the module.
1986
1987 The portion of the identifier after @code{ffe@var{mod}} is
1988 referred to as @code{ctype}, a capitalized (mixed-case) form
1989 of @code{type}.
1990
1991 @item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]?
1992 (Where @var{umod} is the uppercase for of @var{mod}.)
1993
1994 A @code{#define} or @code{enum} constant of the type
1995 @code{ffe@var{mod}@var{type}},
1996 where @var{type} is the lowercase form of @var{ctype}
1997 in an exported typedef.
1998
1999 @item ffe@var{mod}_@var{value}
2000 A function that does or returns something,
2001 as described by @var{value} (see below).
2002
2003 @item ffe@var{mod}_@var{value}_@var{input}
2004 A function that does or returns something based
2005 primarily on the thing described by @var{input} (see below).
2006 @end table
2007
2008 Below are names used for @var{value} and @var{input},
2009 along with their definitions.
2010
2011 @table @code
2012 @item col
2013 A column number within a line (first column is number 1).
2014
2015 @item file
2016 An encapsulation of a file's name.
2017
2018 @item find
2019 Looks up an instance of some type that matches specified criteria,
2020 and returns that, even if it has to create a new instance or
2021 crash trying to find it (as appropriate).
2022
2023 @item initialize
2024 Initializes, usually a module.  No type.
2025
2026 @item int
2027 A generic integer of type @code{int}.
2028
2029 @item is
2030 A generic integer that contains a true (nonzero) or false (zero) value.
2031
2032 @item len
2033 A generic integer that contains the length of something.
2034
2035 @item line
2036 A line number within a source file,
2037 or a global line number.
2038
2039 @item lookup
2040 Looks up an instance of some type that matches specified criteria,
2041 and returns that, or returns nil.
2042
2043 @item name
2044 A @code{text} that points to a name of something.
2045
2046 @item new
2047 Makes a new instance of the indicated type.
2048 Might return an existing one if appropriate---if so,
2049 similar to @code{find} without crashing.
2050
2051 @item pt
2052 Pointer to a particular character (line, column pairs)
2053 in the input file (source code being compiled).
2054
2055 @item run
2056 Performs some herculean task.  No type.
2057
2058 @item terminate
2059 Terminates, usually a module.  No type.
2060
2061 @item text
2062 A @code{char *} that points to generic text.
2063 @end table