gcc/f/ffe.texi

   1 @c Copyright (C) 1999 Free Software Foundation, Inc.
   2 @c This is part of the G77 manual.
   3 @c For copying conditions, see the file g77.texi.
   4
   5 @node Front End
   6 @chapter Front End
   7 @cindex GNU Fortran Front End (FFE)
   8 @cindex FFE
   9 @cindex @code{g77}, front end
  10 @cindex front end, @code{g77}
  11
  12 This chapter describes some aspects of the design and implementation
  13 of the @code{g77} front end.
  14
  15 To find about things that are ``To Be Determined'' or ``To Be Done'',
  16 search for the string TBD.
  17 If you want to help by working on one or more of these items,
  18 email @email{gcc@@gcc.gnu.org}.
  19 If you're planning to do more than just research issues and offer comments,
  20 see @uref{http://www.gnu.org/software/contribute.html} for steps you might
  21 need to take first.
  22
  23 @menu
  24 * Overview of Sources::
  25 * Overview of Translation Process::
  26 * Philosophy of Code Generation::
  27 * Two-pass Design::
  28 * Challenges Posed::
  29 * Transforming Statements::
  30 * Transforming Expressions::
  31 * Internal Naming Conventions::
  32 @end menu
  33
  34 @node Overview of Sources
  35 @section Overview of Sources
  36
  37 The current directory layout includes the following:
  38
  39 @table @file
  40 @item @value{srcdir}/gcc/
  41 Non-g77 files in gcc
  42
  43 @item @value{srcdir}/gcc/f/
  44 GNU Fortran front end sources
  45
  46 @item @value{srcdir}/libf2c/
  47 @code{libg2c} configuration and @code{g2c.h} file generation
  48
  49 @item @value{srcdir}/libf2c/libF77/
  50 General support and math portion of @code{libg2c}
  51
  52 @item @value{srcdir}/libf2c/libI77/
  53 I/O portion of @code{libg2c}
  54
  55 @item @value{srcdir}/libf2c/libU77/
  56 Additional interfaces to Unix @code{libc} for @code{libg2c}
  57 @end table
  58
  59 Components of note in @code{g77} are described below.
  60
  61 @file{f/} as a whole contains the source for @code{g77},
  62 while @file{libf2c/} contains a portion of the separate program
  63 @code{f2c}.
  64 Note that the @code{libf2c} code is not part of the program @code{g77},
  65 just distributed with it.
  66
  67 @file{f/} contains text files that document the Fortran compiler, source
  68 files for the GNU Fortran Front End (FFE), and some other stuff.
  69 The @code{g77} compiler code is placed in @file{f/} because it,
  70 along with its contents,
  71 is designed to be a subdirectory of a @code{gcc} source directory,
  72 @file{gcc/},
  73 which is structured so that language-specific front ends can be ``dropped
  74 in'' as subdirectories.
  75 The C++ front end (@code{g++}), is an example of this---it resides in
  76 the @file{cp/} subdirectory.
  77 Note that the C front end (also referred to as @code{gcc})
  78 is an exception to this, as its source files reside
  79 in the @file{gcc/} directory itself.
  80
  81 @file{libf2c/} contains the run-time libraries for the @code{f2c} program,
  82 also used by @code{g77}.
  83 These libraries normally referred to collectively as @code{libf2c}.
  84 When built as part of @code{g77},
  85 @code{libf2c} is installed under the name @code{libg2c} to avoid
  86 conflict with any existing version of @code{libf2c},
  87 and thus is often referred to as @code{libg2c} when the
  88 @code{g77} version is specifically being referred to.
  89
  90 The @code{netlib} version of @code{libf2c/}
  91 contains two distinct libraries,
  92 @code{libF77} and @code{libI77},
  93 each in their own subdirectories.
  94 In @code{g77}, this distinction is not made,
  95 beyond maintaining the subdirectory structure in the source-code tree.
  96
  97 @file{libf2c/} is not part of the program @code{g77},
  98 just distributed with it.
  99 It contains files not present
 100 in the official (@code{netlib}) version of @code{libf2c},
 101 and also contains some minor changes made from @code{libf2c},
 102 to fix some bugs,
 103 and to facilitate automatic configuration, building, and installation of
 104 @code{libf2c} (as @code{libg2c}) for use by @code{g77} users.
 105 See @file{libf2c/README} for more information,
 106 including licensing conditions
 107 governing distribution of programs containing code from @code{libg2c}.
 108
 109 @code{libg2c}, @code{g77}'s version of @code{libf2c},
 110 adds Dave Love's implementation of @code{libU77},
 111 in the @file{libf2c/libU77/} directory.
 112 This library is distributed under the
 113 GNU Library General Public License (LGPL)---see the
 114 file @file{libf2c/libU77/COPYING.LIB}
 115 for more information,
 116 as this license
 117 governs distribution conditions for programs containing code
 118 from this portion of the library.
 119
 120 Files of note in @file{f/} and @file{libf2c/} are described below:
 121
 122 @table @file
 123 @item f/BUGS
 124 Lists some important bugs known to be in g77.
 125 Or use Info (or GNU Emacs Info mode) to read
 126 the ``Actual Bugs'' node of the @code{g77} documentation:
 127
 128 @smallexample
 129 info -f f/g77.info -n "Actual Bugs"
 130 @end smallexample
 131
 132 @item f/ChangeLog
 133 Lists recent changes to @code{g77} internals.
 134
 135 @item libf2c/ChangeLog
 136 Lists recent changes to @code{libg2c} internals.
 137
 138 @item f/NEWS
 139 Contains the per-release changes.
 140 These include the user-visible
 141 changes described in the node ``Changes''
 142 in the @code{g77} documentation, plus internal
 143 changes of import.
 144 Or use:
 145
 146 @smallexample
 147 info -f f/g77.info -n News
 148 @end smallexample
 149
 150 @item f/g77.info*
 151 The @code{g77} documentation, in Info format,
 152 produced by building @code{g77}.
 153
 154 All users of @code{g77} (not just installers) should read this,
 155 using the @code{more} command if neither the @code{info} command,
 156 nor GNU Emacs (with its Info mode), are available, or if users
 157 aren't yet accustomed to using these tools.
 158 All of these files are readable as ``plain text'' files,
 159 though they're easier to navigate using Info readers
 160 such as @code{info} and GNU Emacs Info mode.
 161 @end table
 162
 163 If you want to explore the FFE code, which lives entirely in @file{f/},
 164 here are a few clues.
 165 The file @file{g77spec.c} contains the @code{g77}-specific source code
 166 for the @code{g77} command only---this just forms a variant of the
 167 @code{gcc} command, so,
 168 just as the @code{gcc} command itself does not contain the C front end,
 169 the @code{g77} command does not contain the Fortran front end (FFE).
 170 The FFE code ends up in an executable named @file{f771},
 171 which does the actual compiling,
 172 so it contains the FFE plus the @code{gcc} back end (GBE),
 173 the latter to do most of the optimization, and the code generation.
 174
 175 The file @file{parse.c} is the source file for @code{yyparse()},
 176 which is invoked by the GBE to start the compilation process,
 177 for @file{f771}.
 178
 179 The file @file{top.c} contains the top-level FFE function @code{ffe_file}
 180 and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*},
 181 and @samp{FFE_[A-Za-z].*} symbols.
 182
 183 The file @file{fini.c} is a @code{main()} program that is used when building
 184 the FFE to generate C header and source files for recognizing keywords.
 185 The files @file{malloc.c} and @file{malloc.h} comprise a memory manager
 186 that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and
 187 @samp{MALLOC_[A-Za-z].*} symbols.
 188
 189 All other modules named @var{xyz}
 190 are comprised of all files named @samp{@var{xyz}*.@var{ext}}
 191 and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*},
 192 and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols.
 193 If you understand all this, congratulations---it's easier for me to remember
 194 how it works than to type in these regular expressions.
 195 But it does make it easy to find where a symbol is defined.
 196 For example, the symbol @samp{ffexyz_set_something} would be defined
 197 in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}.
 198
 199 The ``porting'' files of note currently are:
 200
 201 @table @file
 202 @item proj.c
 203 @itemx proj.h
 204 This defines the ``language'' used by all the other source files,
 205 the language being Standard C plus some useful things
 206 like @code{ARRAY_SIZE} and such.
 207
 208 @item target.c
 209 @itemx target.h
 210 These describe the target machine
 211 in terms of what data types are supported,
 212 how they are denoted
 213 (to what C type does an @code{INTEGER*8} map, for example),
 214 how to convert between them,
 215 and so on.
 216 Over time, versions of @code{g77} rely less on this file
 217 and more on run-time configuration based on GBE info
 218 in @file{com.c}.
 219
 220 @item com.c
 221 @itemx com.h
 222 These are the primary interface to the GBE.
 223
 224 @item ste.c
 225 @itemx ste.h
 226 This contains code for implementing recognized executable statements
 227 in the GBE.
 228
 229 @item src.c
 230 @itemx src.h
 231 These contain information on the format(s) of source files
 232 (such as whether they are never to be processed as case-insensitive
 233 with regard to Fortran keywords).
 234 @end table
 235
 236 If you want to debug the @file{f771} executable,
 237 for example if it crashes,
 238 note that the global variables @code{lineno} and @code{input_filename}
 239 are usually set to reflect the current line being read by the lexer
 240 during the first-pass analysis of a program unit and to reflect
 241 the current line being processed during the second-pass compilation
 242 of a program unit.
 243
 244 If an invocation of the function @code{ffestd_exec_end} is on the stack,
 245 the compiler is in the second pass, otherwise it is in the first.
 246
 247 (This information might help you reduce a test case and/or work around
 248 a bug in @code{g77} until a fix is available.)
 249
 250 @node Overview of Translation Process
 251 @section Overview of Translation Process
 252
 253 The order of phases translating source code to the form accepted
 254 by the GBE is:
 255
 256 @enumerate
 257 @item
 258 Stripping punched-card sources (@file{g77stripcard.c})
 259
 260 @item
 261 Lexing (@file{lex.c})
 262
 263 @item
 264 Stand-alone statement identification (@file{sta.c})
 265
 266 @item
 267 INCLUDE handling (@file{sti.c})
 268
 269 @item
 270 Order-dependent statement identification (@file{stq.c})
 271
 272 @item
 273 Parsing (@file{stb.c} and @file{expr.c})
 274
 275 @item
 276 Constructing (@file{stc.c})
 277
 278 @item
 279 Collecting (@file{std.c})
 280
 281 @item
 282 Expanding (@file{ste.c})
 283 @end enumerate
 284
 285 To get a rough idea of how a particularly twisted Fortran statement
 286 gets treated by the passes, consider:
 287
 288 @smallexample
 289       FORMAT(I2 4H)=(J/
 290      &   I3)
 291 @end smallexample
 292
 293 The job of @file{lex.c} is to know enough about Fortran syntax rules
 294 to break the statement up into distinct lexemes without requiring
 295 any feedback from subsequent phases:
 296
 297 @smallexample
 298 `FORMAT'
 299 `('
 300 `I24H'
 301 `)'
 302 `='
 303 `('
 304 `J'
 305 `/'
 306 `I3'
 307 `)'
 308 @end smallexample
 309
 310 The job of @file{sta.c} is to figure out the kind of statement,
 311 or, at least, statement form, that sequence of lexemes represent.
 312
 313 The sooner it can do this (in terms of using the smallest number of
 314 lexemes, starting with the first for each statement), the better,
 315 because that leaves diagnostics for problems beyond the recognition
 316 of the statement form to subsequent phases,
 317 which can usually better describe the nature of the problem.
 318
 319 In this case, the @samp{=} at ``level zero''
 320 (not nested within parentheses)
 321 tells @file{sta.c} that this is an @emph{assignment-form},
 322 not @code{FORMAT}, statement.
 323
 324 An assignment-form statement might be a statement-function
 325 definition or an executable assignment statement.
 326
 327 To make that determination,
 328 @file{sta.c} looks at the first two lexemes.
 329
 330 Since the second lexeme is @samp{(},
 331 the first must represent an array for this to be an assignment statement,
 332 else it's a statement function.
 333
 334 Either way, @file{sta.c} hands off the statement to @file{stq.c}
 335 (via @file{sti.c}, which expands INCLUDE files).
 336 @file{stq.c} figures out what a statement that is,
 337 on its own, ambiguous, must actually be based on the context
 338 established by previous statements.
 339
 340 So, @file{stq.c} watches the statement stream for executable statements,
 341 END statements, and so on, so it knows whether @samp{A(B)=C} is
 342 (intended as) a statement-function definition or an assignment statement.
 343
 344 After establishing the context-aware statement info, @file{stq.c}
 345 passes the original sample statement on to @file{stb.c}
 346 (either its statement-function parser or its assignment-statement parser).
 347
 348 @file{stb.c} forms a
 349 statement-specific record containing the pertinent information.
 350 That information includes a source expression and,
 351 for an assignment statement, a destination expression.
 352 Expressions are parsed by @file{expr.c}.
 353
 354 This record is passed to @file{stc.c},
 355 which copes with the implications of the statement
 356 within the context established by previous statements.
 357
 358 For example, if it's the first statement in the file
 359 or after an @code{END} statement,
 360 @file{stc.c} recognizes that, first of all,
 361 a main program unit is now being lexed
 362 (and tells that to @file{std.c}
 363 before telling it about the current statement).
 364
 365 @file{stc.c} attaches whatever information it can,
 366 usually derived from the context established by the preceding statements,
 367 and passes the information to @file{std.c}.
 368
 369 @file{std.c} saves this information away,
 370 since the GBE cannot cope with information
 371 that might be incomplete at this stage.
 372
 373 For example, @samp{I3} might later be determined
 374 to be an argument to an alternate @code{ENTRY} point.
 375
 376 When @file{std.c} is told about the end of an external (top-level)
 377 program unit,
 378 it passes all the information it has saved away
 379 on statements in that program unit
 380 to @file{ste.c}.
 381
 382 @file{ste.c} ``expands'' each statement, in sequence, by
 383 constructing the appropriate GBE information and calling
 384 the appropriate GBE routines.
 385
 386 Details on the transformational phases follow.
 387 Keep in mind that Fortran numbering is used,
 388 so the first character on a line is column 1,
 389 decimal numbering is used, and so on.
 390
 391 @menu
 392 * g77stripcard::
 393 * lex.c::
 394 * sta.c::
 395 * sti.c::
 396 * stq.c::
 397 * stb.c::
 398 * expr.c::
 399 * stc.c::
 400 * std.c::
 401 * ste.c::
 402
 403 * Gotchas (Transforming)::
 404 * TBD (Transforming)::
 405 @end menu
 406
 407 @node g77stripcard
 408 @subsection g77stripcard
 409
 410 The @code{g77stripcard} program handles removing content beyond
 411 column 72 (adjustable via a command-line option),
 412 optionally warning about that content being something other
 413 than trailing whitespace or Fortran commentary.
 414
 415 This program is needed because @code{lex.c} doesn't pay attention
 416 to maximum line lengths at all, to make it easier to maintain,
 417 as well as faster (for sources that don't depend on the maximum
 418 column length vis-a-vis trailing non-blank non-commentary content).
 419
 420 Just how this program will be run---whether automatically for
 421 old source (perhaps as the default for @file{.f} files?)---is not
 422 yet determined.
 423
 424 In the meantime, it might as well be implemented as a typical UNIX pipe.
 425
 426 It should accept a @samp{-fline-length-@var{n}} option,
 427 with the default line length set to 72.
 428
 429 When the text it strips off the end of a line is not blank
 430 (not spaces and tabs),
 431 it should insert an additional comment line
 432 (beginning with @samp{!},
 433 so it works for both fixed-form and free-form files)
 434 containing the text,
 435 following the stripped line.
 436 The inserted comment should have a prefix of some kind,
 437 TBD, that distinguishes the comment as representing stripped text.
 438 Users could use that to @code{sed} out such lines, if they wished---it
 439 seems silly to provide a command-line option to delete information
 440 when it can be so easily filtered out by another program.
 441
 442 (This inserted comment should be designed to ``fit in'' well
 443 with whatever the Fortran community is using these days for
 444 preprocessor, translator, and other such products, like OpenMP.
 445 What that's all about, and how @code{g77} can elegantly fit its
 446 special comment conventions into it all, is TBD as well.
 447 We don't want to reinvent the wheel here, but if there turn out
 448 to be too many conflicting conventions, we might have to invent
 449 one that looks nothing like the others, but which offers their
 450 host products a better infrastructure in which to fit and coexist
 451 peacefully.)
 452
 453 @code{g77stripcard} probably shouldn't do any tab expansion or other
 454 fancy stuff.
 455 People can use @code{expand} or other pre-filtering if they like.
 456 The idea here is to keep each stage quite simple, while providing
 457 excellent performance for ``normal'' code.
 458
 459 (Code with junk beyond column 73 is not really ``normal'',
 460 as it comes from a card-punch heritage,
 461 and will be increasingly hard for tomorrow's Fortran programmers to read.)
 462
 463 @node lex.c
 464 @subsection lex.c
 465
 466 To help make the lexer simple, fast, and easy to maintain,
 467 while also having @code{g77} generally encourage Fortran programmers
 468 to write simple, maintainable, portable code by maximizing the
 469 performance of compiling that kind of code:
 470
 471 @itemize @bullet
 472 @item
 473 There'll be just one lexer, for both fixed-form and free-form source.
 474
 475 @item
 476 It'll care about the form only when handling the first 7 columns of
 477 text, stuff like spaces between strings of alphanumerics, and
 478 how lines are continued.
 479
 480 Some other distinctions will be handled by subsequent phases,
 481 so at least one of them will have to know which form is involved.
 482
 483 For example, @samp{I = 2 . 4} is acceptable in fixed form,
 484 and works in free form as well given the implementation @code{g77}
 485 presently uses.
 486 But the standard requires a diagnostic for it in free form,
 487 so the parser has to be able to recognize that
 488 the lexemes aren't contiguous
 489 (information the lexer @emph{does} have to provide)
 490 and that free-form source is being parsed,
 491 so it can provide the diagnostic.
 492
 493 The @code{g77} lexer doesn't try to gather @samp{2 . 4} into a single lexeme.
 494 Otherwise, it'd have to know a whole lot more about how to parse Fortran,
 495 or subsequent phases (mainly parsing) would have two paths through
 496 lots of critical code---one to handle the lexeme @samp{2}, @samp{.},
 497 and @samp{4} in sequence, another to handle the lexeme @samp{2.4}.
 498
 499 @item
 500 It won't worry about line lengths
 501 (beyond the first 7 columns for fixed-form source).
 502
 503 That is, once it starts parsing the ``statement'' part of a line
 504 (column 7 for fixed-form, column 1 for free-form),
 505 it'll keep going until it finds a newline,
 506 rather than ignoring everything past a particular column
 507 (72 or 132).
 508
 509 The implication here is that there shouldn't @emph{be}
 510 anything past that last column, other than whitespace or
 511 commentary, because users using typical editors
 512 (or viewing output as typically printed)
 513 won't necessarily know just where the last column is.
 514
 515 Code that has ``garbage'' beyond the last column
 516 (almost certainly only fixed-form code with a punched-card legacy,
 517 such as code using columns 73-80 for ``sequence numbers'')
 518 will have to be run through @code{g77stripcard} first.
 519
 520 Also, keeping track of the maximum column position while also watching out
 521 for the end of a line @emph{and} while reading from a file
 522 just makes things slower.
 523 Since a file must be read, and watching for the end of the line
 524 is necessary (unless the typical input file was preprocessed to
 525 include the necessary number of trailing spaces),
 526 dropping the tracking of the maximum column position
 527 is the only way to reduce the complexity of the pertinent code
 528 while maintaining high performance.
 529
 530 @item
 531 ASCII encoding is assumed for the input file.
 532
 533 Code written in other character sets will have to be converted first.
 534
 535 @item
 536 Tabs (ASCII code 9)
 537 will be converted to spaces via the straightforward
 538 approach.
 539
 540 Specifically, a tab is converted to between one and eight spaces
 541 as necessary to reach column @var{n},
 542 where dividing @samp{(@var{n} - 1)} by eight
 543 results in a remainder of zero.
 544
 545 That saves having to pass most source files through @code{expand}.
 546
 547 @item
 548 Linefeeds (ASCII code 10)
 549 mark the ends of lines.
 550
 551 @item
 552 A carriage return (ASCII code 13)
 553 is accept if it immediately precedes a linefeed,
 554 in which case it is ignored.
 555
 556 Otherwise, it is rejected (with a diagnostic).
 557
 558 @item
 559 Any other characters other than the above
 560 that are not part of the GNU Fortran Character Set
 561 (@pxref{Character Set})
 562 are rejected with a diagnostic.
 563
 564 This includes backspaces, form feeds, and the like.
 565
 566 (It might make sense to allow a form feed in column 1
 567 as long as that's the only character on a line.
 568 It certainly wouldn't seem to cost much in terms of performance.)
 569
 570 @item
 571 The end of the input stream (EOF)
 572 ends the current line.
 573
 574 @item
 575 The distinction between uppercase and lowercase letters
 576 will be preserved.
 577
 578 It will be up to subsequent phases to decide to fold case.
 579
 580 Current plans are to permit any casing for Fortran (reserved) keywords
 581 while preserving casing for user-defined names.
 582 (This might not be made the default for @file{.f} files, though.)
 583
 584 Preserving case seems necessary to provide more direct access
 585 to facilities outside of @code{g77}, such as to C or Pascal code.
 586
 587 Names of intrinsics will probably be matchable in any case,
 588
 589 (How @samp{external SiN; r = sin(x)} would be handled is TBD.
 590 I think old @code{g77} might already handle that pretty elegantly,
 591 but whether we can cope with allowing the same fragment to reference
 592 a @emph{different} procedure, even with the same interface,
 593 via @samp{s = SiN(r)}, needs to be determined.
 594 If it can't, we need to make sure that when code introduces
 595 a user-defined name, any intrinsic matching that name
 596 using a case-insensitive comparison
 597 is ``turned off''.)
 598
 599 @item
 600 Backslashes in @code{CHARACTER} and Hollerith constants
 601 are not allowed.
 602
 603 This avoids the confusion introduced by some Fortran compiler vendors
 604 providing C-like interpretation of backslashes,
 605 while others provide straight-through interpretation.
 606
 607 Some kind of lexical construct (TBD) will be provided to allow
 608 flagging of a @code{CHARACTER}
 609 (but probably not a Hollerith)
 610 constant that permits backslashes.
 611 It'll necessarily be a prefix, such as:
 612
 613 @smallexample
 614 PRINT *, C'This line has a backspace \b here.'
 615 PRINT *, F'This line has a straight backslash \ here.'
 616 @end smallexample
 617
 618 Further, command-line options might be provided to specify that
 619 one prefix or the other is to be assumed as the default
 620 for @code{CHARACTER} constants.
 621
 622 However, it seems more helpful for @code{g77} to provide a program
 623 that converts prefix all constants
 624 (or just those containing backslashes)
 625 with the desired designation,
 626 so printouts of code can be read
 627 without knowing the compile-time options used when compiling it.
 628
 629 If such a program is provided
 630 (let's name it @code{g77slash} for now),
 631 then a command-line option to @code{g77} should not be provided.
 632 (Though, given that it'll be easy to implement, it might be hard
 633 to resist user requests for it ``to compile faster than if we
 634 have to invoke another filter''.)
 635
 636 This program would take a command-line option to specify the
 637 default interpretation of slashes,
 638 affecting which prefix it uses for constants.
 639
 640 @code{g77slash} probably should automatically convert Hollerith
 641 constants that contain slashes
 642 to the appropriate @code{CHARACTER} constants.
 643 Then @code{g77} wouldn't have to define a prefix syntax for Hollerith
 644 constants specifying whether they want C-style or straight-through
 645 backslashes.
 646
 647 @item
 648 To allow for form-neutral INCLUDE files without requiring them
 649 to be preprocessed,
 650 the fixed-form lexer should offer an extension (if possible)
 651 allowing a trailing @samp{&} to be ignored, especially if after
 652 column 72, as it would be using the traditional Unix Fortran source
 653 model (which ignores @emph{everything} after column 72).
 654 @end itemize
 655
 656 The above implements nearly exactly what is specified by
 657 @ref{Character Set},
 658 and
 659 @ref{Lines},
 660 except it also provides automatic conversion of tabs
 661 and ignoring of newline-related carriage returns,
 662 as well as accommodating form-neutral INCLUDE files.
 663
 664 It also implements the ``pure visual'' model,
 665 by which is meant that a user viewing his code
 666 in a typical text editor
 667 (assuming it's not preprocessed via @code{g77stripcard} or similar)
 668 doesn't need any special knowledge
 669 of whether spaces on the screen are really tabs,
 670 whether lines end immediately after the last visible non-space character
 671 or after a number of spaces and tabs that follow it,
 672 or whether the last line in the file is ended by a newline.
 673
 674 Most editors don't make these distinctions,
 675 the ANSI FORTRAN 77 standard doesn't require them to,
 676 and it permits a standard-conforming compiler
 677 to define a method for transforming source code to
 678 ``standard form'' however it wants.
 679
 680 So, GNU Fortran defines it such that users have the best chance
 681 of having the code be interpreted the way it looks on the screen
 682 of the typical editor.
 683
 684 (Fancy editors should @emph{never} be required to correctly read code
 685 written in classic two-dimensional-plaintext form.
 686 By correct reading I mean ability to read it, book-like, without
 687 mistaking text ignored by the compiler for program code and vice versa,
 688 and without having to count beyond the first several columns.
 689 The vague meaning of ASCII TAB, among other things, complicates
 690 this somewhat, but as long as ``everyone'', including the editor,
 691 other tools, and printer, agrees about the every-eighth-column convention,
 692 the GNU Fortran ``pure visual'' model meets these requirements.
 693 Any language or user-visible source form
 694 requiring special tagging of tabs,
 695 the ends of lines after spaces/tabs,
 696 and so on, fails to meet this fairly straightforward specification.
 697 Fortunately, Fortran @emph{itself} does not mandate such a failure,
 698 though most vendor-supplied defaults for their Fortran compilers @emph{do}
 699 fail to meet this specification for readability.)
 700
 701 Further, this model provides a clean interface
 702 to whatever preprocessors or code-generators are used
 703 to produce input to this phase of @code{g77}.
 704 Mainly, they need not worry about long lines.
 705
 706 @node sta.c
 707 @subsection sta.c
 708
 709 @node sti.c
 710 @subsection sti.c
 711
 712 @node stq.c
 713 @subsection stq.c
 714
 715 @node stb.c
 716 @subsection stb.c
 717
 718 @node expr.c
 719 @subsection expr.c
 720
 721 @node stc.c
 722 @subsection stc.c
 723
 724 @node std.c
 725 @subsection std.c
 726
 727 @node ste.c
 728 @subsection ste.c
 729
 730 @node Gotchas (Transforming)
 731 @subsection Gotchas (Transforming)
 732
 733 This section is not about transforming ``gotchas'' into something else.
 734 It is about the weirder aspects of transforming Fortran,
 735 however that's defined,
 736 into a more modern, canonical form.
 737
 738 @subsubsection Multi-character Lexemes
 739
 740 Each lexeme carries with it a pointer to where it appears in the source.
 741
 742 To provide the ability for diagnostics to point to column numbers,
 743 in addition to line numbers and names,
 744 lexemes that represent more than one (significant) character
 745 in the source code need, generally,
 746 to provide pointers to where each @emph{character} appears in the source.
 747
 748 This provides the ability to properly identify the precise location
 749 of the problem in code like
 750
 751 @smallexample
 752 SUBROUTINE X
 753 END
 754 BLOCK DATA X
 755 END
 756 @end smallexample
 757
 758 which, in fixed-form source, would result in single lexemes
 759 consisting of the strings @samp{SUBROUTINEX} and @samp{BLOCKDATAX}.
 760 (The problem is that @samp{X} is defined twice,
 761 so a pointer to the @samp{X} in the second definition,
 762 as well as a follow-up pointer to the corresponding pointer in the first,
 763 would be preferable to pointing to the beginnings of the statements.)
 764
 765 This need also arises when parsing (and diagnosing) @code{FORMAT}
 766 statements.
 767
 768 Further, it arises when diagnosing
 769 @code{FMT=} specifiers that contain constants
 770 (or partial constants, or even propagated constants!)
 771 in I/O statements, as in:
 772
 773 @smallexample
 774 PRINT '(I2, 3HAB)', J
 775 @end smallexample
 776
 777 (A pointer to the beginning of the prematurely-terminated Hollerith
 778 constant, and/or to the close parenthese, is preferable to a pointer
 779 to the open-parenthese or the apostrophe that precedes it.)
 780
 781 Multi-character lexemes, which would seem to naturally include
 782 at least digit strings, alphanumeric strings, @code{CHARACTER}
 783 constants, and Hollerith constants, therefore need to provide
 784 location information on each character.
 785 (Maybe Hollerith constants don't, but it's unnecessary to except them.)
 786
 787 The question then arises, what about @emph{other} multi-character lexemes,
 788 such as @samp{**} and @samp{//},
 789 and Fortran 90's @samp{(/}, @samp{/)}, @samp{::}, and so on?
 790
 791 Turns out there's a need to identify the location of the second character
 792 of these two-character lexemes.
 793 For example, in @samp{I(/J) = K}, the slash needs to be diagnosed
 794 as the problem, not the open parenthese.
 795 Similarly, it is preferable to diagnose the second slash in
 796 @samp{I = J // K} rather than the first, given the implicit typing
 797 rules, which would result in the compiler disallowing the attempted
 798 concatenation of two integers.
 799 (Though, since that's more of a semantic issue,
 800 it's not @emph{that} much preferable.)
 801
 802 Even sequences that could be parsed as digit strings could use location info,
 803 for example, to diagnose the @samp{9} in the octal constant @samp{O'129'}.
 804 (This probably will be parsed as a character string,
 805 to be consistent with the parsing of @samp{Z'129A'}.)
 806
 807 To avoid the hassle of recording the location of the second character,
 808 while also preserving the general rule that each significant character
 809 is distinctly pointed to by the lexeme that contains it,
 810 it's best to simply not have any fixed-size lexemes
 811 larger than one character.
 812
 813 This new design is expected to make checking for two
 814 @samp{*} lexemes in a row much easier than the old design,
 815 so this is not much of a sacrifice.
 816 It probably makes the lexer much easier to implement
 817 than it makes the parser harder.
 818
 819 @subsubsection Space-padding Lexemes
 820
 821 Certain lexemes need to be padded with virtual spaces when the
 822 end of the line (or file) is encountered.
 823
 824 This is necessary in fixed form, to handle lines that don't
 825 extend to column 72, assuming that's the line length in effect.
 826
 827 @subsubsection Bizarre Free-form Hollerith Constants
 828
 829 Last I checked, the Fortran 90 standard actually required the compiler
 830 to silently accept something like
 831
 832 @smallexample
 833 FORMAT ( 1 2   Htwelve chars )
 834 @end smallexample
 835
 836 as a valid @code{FORMAT} statement specifying a twelve-character
 837 Hollerith constant.
 838
 839 The implication here is that, since the new lexer is a zero-feedback one,
 840 it won't know that the special case of a @code{FORMAT} statement being parsed
 841 requires apparently distinct lexemes @samp{1} and @samp{2} to be treated as
 842 a single lexeme.
 843
 844 (This is a horrible misfeature of the Fortran 90 language.
 845 It's one of many such misfeatures that almost make me want
 846 to not support them, and forge ahead with designing a new
 847 ``GNU Fortran'' language that has the features,
 848 but not the misfeatures, of Fortran 90,
 849 and provide utility programs to do the conversion automatically.)
 850
 851 So, the lexer must gather distinct chunks of decimal strings into
 852 a single lexeme in contexts where a single decimal lexeme might
 853 start a Hollerith constant.
 854
 855 (Which probably means it might as well do that all the time
 856 for all multi-character lexemes, even in free-form mode,
 857 leaving it to subsequent phases to pull them apart as they see fit.)
 858
 859 Compare the treatment of this to how
 860
 861 @smallexample
 862 CHARACTER * 4 5 HEY
 863 @end smallexample
 864
 865 and
 866
 867 @smallexample
 868 CHARACTER * 12 HEY
 869 @end smallexample
 870
 871 must be treated---the former must be diagnosed, due to the separation
 872 between lexemes, the latter must be accepted as a proper declaration.
 873
 874 @subsubsection Hollerith Constants
 875
 876 Recognizing a Hollerith constant---specifically,
 877 that an @samp{H} or @samp{h} after a digit string begins
 878 such a constant---requires some knowledge of context.
 879
 880 Hollerith constants (such as @samp{2HAB}) can appear after:
 881
 882 @itemize @bullet
 883 @item
 884 @samp{(}
 885
 886 @item
 887 @samp{,}
 888
 889 @item
 890 @samp{=}
 891
 892 @item
 893 @samp{+}, @samp{-}, @samp{/}
 894
 895 @item
 896 @samp{*}, except as noted below
 897 @end itemize
 898
 899 Hollerith constants don't appear after:
 900
 901 @itemize @bullet
 902 @item
 903 @samp{CHARACTER*},
 904 which can be treated generally as
 905 any @samp{*} that is the second lexeme of a statement
 906 @end itemize
 907
 908 @subsubsection Confusing Function Keyword
 909
 910 While
 911
 912 @smallexample
 913 REAL FUNCTION FOO ()
 914 @end smallexample
 915
 916 must be a @code{FUNCTION} statement and
 917
 918 @smallexample
 919 REAL FUNCTION FOO (5)
 920 @end smallexample
 921
 922 must be a type-definition statement,
 923
 924 @smallexample
 925 REAL FUNCTION FOO (@var{names})
 926 @end smallexample
 927
 928 where @var{names} is a comma-separated list of names,
 929 can be one or the other.
 930
 931 The only way to disambiguate that statement
 932 (short of mandating free-form source or a short maximum
 933 length for name for external procedures)
 934 is based on the context of the statement.
 935
 936 In particular, the statement is known to be within an
 937 already-started program unit
 938 (but not at the outer level of the @code{CONTAINS} block),
 939 it is a type-declaration statement.
 940
 941 Otherwise, the statement is a @code{FUNCTION} statement,
 942 in that it begins a function program unit
 943 (external, or, within @code{CONTAINS}, nested).
 944
 945 @subsubsection Weird READ
 946
 947 The statement
 948
 949 @smallexample
 950 READ (N)
 951 @end smallexample
 952
 953 is equivalent to either
 954
 955 @smallexample
 956 READ (UNIT=(N))
 957 @end smallexample
 958
 959 or
 960
 961 @smallexample
 962 READ (FMT=(N))
 963 @end smallexample
 964
 965 depending on which would be valid in context.
 966
 967 Specifically, if @samp{N} is type @code{INTEGER},
 968 @samp{READ (FMT=(N))} would not be valid,
 969 because parentheses may not be used around @samp{N},
 970 whereas they may around it in @samp{READ (UNIT=(N))}.
 971
 972 Further, if @samp{N} is type @code{CHARACTER},
 973 the opposite is true---@samp{READ (UNIT=(N))} is not valid,
 974 but @samp{READ (FMT=(N))} is.
 975
 976 Strictly speaking, if anything follows
 977
 978 @smallexample
 979 READ (N)
 980 @end smallexample
 981
 982 in the statement, whether the first lexeme after the close
 983 parenthese is a comma could be used to disambiguate the two cases,
 984 without looking at the type of @samp{N},
 985 because the comma is required for the @samp{READ (FMT=(N))}
 986 interpretation and disallowed for the @samp{READ (UNIT=(N))}
 987 interpretation.
 988
 989 However, in practice, many Fortran compilers allow
 990 the comma for the @samp{READ (UNIT=(N))}
 991 interpretation anyway
 992 (in that they generally allow a leading comma before
 993 an I/O list in an I/O statement),
 994 and much code takes advantage of this allowance.
 995
 996 (This is quite a reasonable allowance, since the
 997 juxtaposition of a comma-separated list immediately
 998 after an I/O control-specification list, which is also comma-separated,
 999 without an intervening comma,
1000 looks sufficiently ``wrong'' to programmers
1001 that they can't resist the itch to insert the comma.
1002 @samp{READ (I, J), K, L} simply looks cleaner than
1003 @samp{READ (I, J) K, L}.)
1004
1005 So, type-based disambiguation is needed unless strict adherence
1006 to the standard is always assumed, and we're not going to assume that.
1007
1008 @node TBD (Transforming)
1009 @subsection TBD (Transforming)
1010
1011 Continue researching gotchas, designing the transformational process,
1012 and implementing it.
1013
1014 Specific issues to resolve:
1015
1016 @itemize @bullet
1017 @item
1018 Just where should (if it was implemented) @code{USE} processing take place?
1019
1020 This gets into the whole issue of how @code{g77} should handle the concept
1021 of modules.
1022 I think GNAT already takes on this issue, but don't know more than that.
1023 Jim Giles has written extensively on @code{comp.lang.fortran}
1024 about his opinions on module handling, as have others.
1025 Jim's views should be taken into account.
1026
1027 Actually, Richard M. Stallman (RMS) also has written up
1028 some guidelines for implementing such things,
1029 but I'm not sure where I read them.
1030 Perhaps the old @email{gcc2@@cygnus.com} list.
1031
1032 If someone could dig references to these up and get them to me,
1033 that would be much appreciated!
1034 Even though modules are not on the short-term list for implementation,
1035 it'd be helpful to know @emph{now} how to avoid making them harder to
1036 implement them @emph{later}.
1037
1038 @item
1039 Should the @code{g77} command become just a script that invokes
1040 all the various preprocessing that might be needed,
1041 thus making it seem slower than necessary for legacy code
1042 that people are unwilling to convert,
1043 or should we provide a separate script for that,
1044 thus encouraging people to convert their code once and for all?
1045
1046 At least, a separate script to behave as old @code{g77} did,
1047 perhaps named @code{g77old}, might ease the transition,
1048 as might a corresponding one that converts source codes
1049 named @code{g77oldnew}.
1050
1051 These scripts would take all the pertinent options @code{g77} used
1052 to take and run the appropriate filters,
1053 passing the results to @code{g77} or just making new sources out of them
1054 (in a subdirectory, leaving the user to do the dirty deed of
1055 moving or copying them over the old sources).
1056
1057 @item
1058 Do other Fortran compilers provide a prefix syntax
1059 to govern the treatment of backslashes in @code{CHARACTER}
1060 (or Hollerith) constants?
1061
1062 Knowing what other compilers provide would help.
1063
1064 @item
1065 Is it okay to drop support for the @samp{-fintrin-case-initcap},
1066 @samp{-fmatch-case-initcap}, @samp{-fsymbol-case-initcap},
1067 and @samp{-fcase-initcap} options?
1068
1069 I've asked @email{info-gnu-fortran@@gnu.org} for input on this.
1070 Not having to support these makes it easier to write the new front end,
1071 and might also avoid complicated its design.
1072
1073 The consensus to date (1999-11-17) has been to drop this support.
1074 Can't recall anybody saying they're using it, in fact.
1075 @end itemize
1076
1077 @node Philosophy of Code Generation
1078 @section Philosophy of Code Generation
1079
1080 Don't poke the bear.
1081
1082 The @code{g77} front end generates code
1083 via the @code{gcc} back end.
1084
1085 @cindex GNU Back End (GBE)
1086 @cindex GBE
1087 @cindex @code{gcc}, back end
1088 @cindex back end, gcc
1089 @cindex code generator
1090 The @code{gcc} back end (GBE) is a large, complex
1091 labyrinth of intricate code
1092 written in a combination of the C language
1093 and specialized languages internal to @code{gcc}.
1094
1095 While the @emph{code} that implements the GBE
1096 is written in a combination of languages,
1097 the GBE itself is,
1098 to the front end for a language like Fortran,
1099 best viewed as a @emph{compiler}
1100 that compiles its own, unique, language.
1101
1102 The GBE's ``source'', then, is written in this language,
1103 which consists primarily of
1104 a combination of calls to GBE functions
1105 and @dfn{tree} nodes
1106 (which are, themselves, created
1107 by calling GBE functions).
1108
1109 So, the @code{g77} generates code by, in effect,
1110 translating the Fortran code it reads
1111 into a form ``written'' in the ``language''
1112 of the @code{gcc} back end.
1113
1114 @cindex GBEL
1115 @cindex GNU Back End Language (GBEL)
1116 This language will heretofore be referred to as @dfn{GBEL},
1117 for GNU Back End Language.
1118
1119 GBEL is an evolving language,
1120 not fully specified in any published form
1121 as of this writing.
1122 It offers many facilities,
1123 but its ``core'' facilities
1124 are those that corresponding most directly
1125 to those needed to support @code{gcc}
1126 (compiling code written in GNU C).
1127
1128 The @code{g77} Fortran Front End (FFE)
1129 is designed and implemented
1130 to navigate the currents and eddies
1131 of ongoing GBEL and @code{gcc} development
1132 while also delivering on the potential
1133 of an integrated FFE
1134 (as compared to using a converter like @code{f2c}
1135 and feeding the output into @code{gcc}).
1136
1137 Goals of the FFE's code-generation strategy include:
1138
1139 @itemize @bullet
1140 @item
1141 High likelihood of generation of correct code,
1142 or, failing that, producing a fatal diagnostic or crashing.
1143
1144 @item
1145 Generation of highly optimized code,
1146 as directed by the user
1147 via GBE-specific (versus @code{g77}-specific) constructs,
1148 such as command-line options.
1149
1150 @item
1151 Fast overall (FFE plus GBE) compilation.
1152
1153 @item
1154 Preservation of source-level debugging information.
1155 @end itemize
1156
1157 The strategies historically, and currently, used by the FFE
1158 to achieve these goals include:
1159
1160 @itemize @bullet
1161 @item
1162 Use of GBEL constructs that most faithfully encapsulate
1163 the semantics of Fortran.
1164
1165 @item
1166 Avoidance of GBEL constructs that are so rarely used,
1167 or limited to use in specialized situations not related to Fortran,
1168 that their reliability and performance has not yet been established
1169 as sufficient for use by the FFE.
1170
1171 @item
1172 Flexible design, to readily accommodate changes to specific
1173 code-generation strategies, perhaps governed by command-line options.
1174 @end itemize
1175
1176 @cindex Bear-poking
1177 @cindex Poking the bear
1178 ``Don't poke the bear'' somewhat summarizes the above strategies.
1179 The GBE is the bear.
1180 The FFE is designed and implemented to avoid poking it
1181 in ways that are likely to just annoy it.
1182 The FFE usually either tackles it head-on,
1183 or avoids treating it in ways dissimilar to how
1184 the @code{gcc} front end treats it.
1185
1186 For example, the FFE uses the native array facility in the back end
1187 instead of the lower-level pointer-arithmetic facility
1188 used by @code{gcc} when compiling @code{f2c} output).
1189 Theoretically, this presents more opportunities for optimization,
1190 faster compile times,
1191 and the production of more faithful debugging information.
1192 These benefits were not, however, immediately realized,
1193 mainly because @code{gcc} itself makes little or no use
1194 of the native array facility.
1195
1196 Complex arithmetic is a case study of the evolution of this strategy.
1197 When originally implemented,
1198 the GBEL had just evolved its own native complex-arithmetic facility,
1199 so the FFE took advantage of that.
1200
1201 When porting @code{g77} to 64-bit systems,
1202 it was discovered that the GBE didn't really
1203 implement its native complex-arithmetic facility properly.
1204
1205 The short-term solution was to rewrite the FFE
1206 to instead use the lower-level facilities
1207 that'd be used by @code{gcc}-compiled code
1208 (assuming that code, itself, didn't use the native complex type
1209 provided, as an extension, by @code{gcc}),
1210 since these were known to work,
1211 and, in any case, if shown to not work,
1212 would likely be rapidly fixed
1213 (since they'd likely not work for vanilla C code in similar circumstances).
1214
1215 However, the rewrite accommodated the original, native approach as well
1216 by offering a command-line option to select it over the emulated approach.
1217 This allowed users, and especially GBE maintainers, to try out
1218 fixes to complex-arithmetic support in the GBE
1219 while @code{g77} continued to default to compiling more code correctly,
1220 albeit producing (typically) slower executables.
1221
1222 As of April 1999, it appeared that the last few bugs
1223 in the GBE's support of its native complex-arithmetic facility
1224 were worked out.
1225 The FFE was changed back to default to using that native facility,
1226 leaving emulation as an option.
1227
1228 Later during the release cycle
1229 (which was called EGCS 1.2, but soon became GCC 2.95),
1230 bugs in the native facility were found.
1231 Reactions among various people included
1232 ``the last thing we should do is change the default back'',
1233 ``we must change the default back'',
1234 and ``let's figure out whether we can narrow down the bugs to
1235 few enough cases to allow the now-months-long-tested default
1236 to remain the same''.
1237 The latter viewpoint won that particular time.
1238 The bugs exposed other concerns regarding ABI compliance
1239 when the ABI specified treatment of complex data as different
1240 from treatment of what Fortran and GNU C consider the equivalent
1241 aggregation (structure) of real (or float) pairs.
1242
1243 Other Fortran constructs---arrays, character strings,
1244 complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates,
1245 and so on---involve issues similar to those pertaining to complex arithmetic.
1246
1247 So, it is possible that the history
1248 of how the FFE handled complex arithmetic
1249 will be repeated, probably in modified form
1250 (and hopefully over shorter timeframes),
1251 for some of these other facilities.
1252
1253 @node Two-pass Design
1254 @section Two-pass Design
1255
1256 The FFE does not tell the GBE anything about a program unit
1257 until after the last statement in that unit has been parsed.
1258 (A program unit is a Fortran concept that corresponds, in the C world,
1259 mostly closely to functions definitions in ISO C.
1260 That is, a program unit in Fortran is like a top-level function in C.
1261 Nested functions, found among the extensions offered by GNU C,
1262 correspond roughly to Fortran's statement functions.)
1263
1264 So, while parsing the code in a program unit,
1265 the FFE saves up all the information
1266 on statements, expressions, names, and so on,
1267 until it has seen the last statement.
1268
1269 At that point, the FFE revisits the saved information
1270 (in what amounts to a second @dfn{pass} over the program unit)
1271 to perform the actual translation of the program unit into GBEL,
1272 ultimating in the generation of assembly code for it.
1273
1274 Some lookahead is performed during this second pass,
1275 so the FFE could be viewed as a ``two-plus-pass'' design.
1276
1277 @menu
1278 * Two-pass Code::
1279 * Why Two Passes::
1280 @end menu
1281
1282 @node Two-pass Code
1283 @subsection Two-pass Code
1284
1285 Most of the code that turns the first pass (parsing)
1286 into a second pass for code generation
1287 is in @file{@value{path-g77}/std.c}.
1288
1289 It has external functions,
1290 called mainly by siblings in @file{@value{path-g77}/stc.c},
1291 that record the information on statements and expressions
1292 in the order they are seen in the source code.
1293 These functions save that information.
1294
1295 It also has an external function that revisits that information,
1296 calling the siblings in @file{@value{path-g77}/ste.c},
1297 which handles the actual code generation
1298 (by generating GBEL code,
1299 that is, by calling GBE routines
1300 to represent and specify expressions, statements, and so on).
1301
1302 @node Why Two Passes
1303 @subsection Why Two Passes
1304
1305 The need for two passes was not immediately evident
1306 during the design and implementation of the code in the FFE
1307 that was to produce GBEL.
1308 Only after a few kludges,
1309 to handle things like incorrectly-guessed @code{ASSIGN} label nature,
1310 had been implemented,
1311 did enough evidence pile up to make it clear
1312 that @file{std.c} had to be introduced to intercept,
1313 save, then revisit as part of a second pass,
1314 the digested contents of a program unit.
1315
1316 Other such missteps have occurred during the evolution of the FFE,
1317 because of the different goals of the FFE and the GBE.
1318
1319 Because the GBE's original, and still primary, goal
1320 was to directly support the GNU C language,
1321 the GBEL, and the GBE itself,
1322 requires more complexity
1323 on the part of most front ends
1324 than it requires of @code{gcc}'s.
1325
1326 For example,
1327 the GBEL offers an interface that permits the @code{gcc} front end
1328 to implement most, or all, of the language features it supports,
1329 without the front end having to
1330 make use of non-user-defined variables.
1331 (It's almost certainly the case that all of K&R C,
1332 and probably ANSI C as well,
1333 is handled by the @code{gcc} front end
1334 without declaring such variables.)
1335
1336 The FFE, on the other hand, must resort to a variety of ``tricks''
1337 to achieve its goals.
1338
1339 Consider the following C code:
1340
1341 @smallexample
1342 int
1343 foo (int a, int b)
1344 @{
1345   int c = 0;
1346
1347   if ((c = bar (c)) == 0)
1348     goto done;
1349
1350   quux (c << 1);
1351
1352 done:
1353   return c;
1354 @}
1355 @end smallexample
1356
1357 Note what kinds of objects are declared, or defined, before their use,
1358 and before any actual code generation involving them
1359 would normally take place:
1360
1361 @itemize @bullet
1362 @item
1363 Return type of function
1364
1365 @item
1366 Entry point(s) of function
1367
1368 @item
1369 Dummy arguments
1370
1371 @item
1372 Variables
1373
1374 @item
1375 Initial values for variables
1376 @end itemize
1377
1378 Whereas, the following items can, and do,
1379 suddenly appear ``out of the blue'' in C:
1380
1381 @itemize @bullet
1382 @item
1383 Label references
1384
1385 @item
1386 Function references
1387 @end itemize
1388
1389 Not surprisingly, the GBE faithfully permits the latter set of items
1390 to be ``discovered'' partway through GBEL ``programs'',
1391 just as they are permitted to in C.
1392
1393 Yet, the GBE has tended, at least in the past,
1394 to be reticent to fully support similar ``late'' discovery
1395 of items in the former set.
1396
1397 This makes Fortran a poor fit for the ``safe'' subset of GBEL.
1398 Consider:
1399
1400 @smallexample
1401       FUNCTION X (A, ARRAY, ID1)
1402       CHARACTER*(*) A
1403       DOUBLE PRECISION X, Y, Z, TMP, EE, PI
1404       REAL ARRAY(ID1*ID2)
1405       COMMON ID2
1406       EXTERNAL FRED
1407
1408       ASSIGN 100 TO J
1409       CALL FOO (I)
1410       IF (I .EQ. 0) PRINT *, A(0)
1411       GOTO 200
1412
1413       ENTRY Y (Z)
1414       ASSIGN 101 TO J
1415 200   PRINT *, A(1)
1416       READ *, TMP
1417       GOTO J
1418 100   X = TMP * EE
1419       RETURN
1420 101   Y = TMP * PI
1421       CALL FRED
1422       DATA EE, PI /2.71D0, 3.14D0/
1423       END
1424 @end smallexample
1425
1426 Here are some observations about the above code,
1427 which, while somewhat contrived,
1428 conforms to the FORTRAN 77 and Fortran 90 standards:
1429
1430 @itemize @bullet
1431 @item
1432 The return type of function @samp{X} is not known
1433 until the @samp{DOUBLE PRECISION} line has been parsed.
1434
1435 @item
1436 Whether @samp{A} is a function or a variable
1437 is not known until the @samp{PRINT *, A(0)} statement
1438 has been parsed.
1439
1440 @item
1441 The bounds of the array of argument @samp{ARRAY}
1442 depend on a computation involving
1443 the subsequent argument @samp{ID1}
1444 and the blank-common member @samp{ID2}.
1445
1446 @item
1447 Whether @samp{Y} and @samp{Z} are local variables,
1448 additional function entry points,
1449 or dummy arguments to additional entry points
1450 is not known
1451 until the @code{ENTRY} statement is parsed.
1452
1453 @item
1454 Similarly, whether @samp{TMP} is a local variable is not known
1455 until the @samp{READ *, TMP} statement is parsed.
1456
1457 @item
1458 The initial values for @samp{EE} and @samp{PI}
1459 are not known until after the @code{DATA} statement is parsed.
1460
1461 @item
1462 Whether @samp{FRED} is a function returning type @code{REAL}
1463 or a subroutine
1464 (which can be thought of as returning type @code{void}
1465 @emph{or}, to support alternate returns in a simple way,
1466 type @code{int})
1467 is not known
1468 until the @samp{CALL FRED} statement is parsed.
1469
1470 @item
1471 Whether @samp{100} is a @code{FORMAT} label
1472 or the label of an executable statement
1473 is not known
1474 until the @samp{X =} statement is parsed.
1475 (These two types of labels get @emph{very} different treatment,
1476 especially when @code{ASSIGN}'ed.)
1477
1478 @item
1479 That @samp{J} is a local variable is not known
1480 until the first @code{ASSIGN} statement is parsed.
1481 (This happens @emph{after} executable code has been seen.)
1482 @end itemize
1483
1484 Very few of these ``discoveries''
1485 can be accommodated by the GBE as it has evolved over the years.
1486 The GBEL doesn't support several of them,
1487 and those it might appear to support
1488 don't always work properly,
1489 especially in combination with other GBEL and GBE features,
1490 as implemented in the GBE.
1491
1492 (Had the GBE and its GBEL originally evolved to support @code{g77},
1493 the shoe would be on the other foot, so to speak---most, if not all,
1494 of the above would be directly supported by the GBEL,
1495 and a few C constructs would probably not, as they are in reality,
1496 be supported.
1497 Both this mythical, and today's real, GBE caters to its GBEL
1498 by, sometimes, scrambling around, cleaning up after itself---after
1499 discovering that assumptions it made earlier during code generation
1500 are incorrect.
1501 That's not a great design, since it indicates significant code
1502 paths that might be rarely tested but used in some key production
1503 environments.)
1504
1505 So, the FFE handles these discrepancies---between the order in which
1506 it discovers facts about the code it is compiling,
1507 and the order in which the GBEL and GBE support such discoveries---by
1508 performing what amounts to two
1509 passes over each program unit.
1510
1511 (A few ambiguities can remain at that point,
1512 such as whether, given @samp{EXTERNAL BAZ}
1513 and no other reference to @samp{BAZ} in the program unit,
1514 it is a subroutine, a function, or a block-data---which, in C-speak,
1515 governs its declared return type.
1516 Fortunately, these distinctions are easily finessed
1517 for the procedure, library, and object-file interfaces
1518 supported by @code{g77}.)
1519
1520 @node Challenges Posed
1521 @section Challenges Posed
1522
1523 Consider the following Fortran code, which uses various extensions
1524 (including some to Fortran 90):
1525
1526 @smallexample
1527 SUBROUTINE X(A)
1528 CHARACTER*(*) A
1529 COMPLEX CFUNC
1530 INTEGER*2 CLOCKS(200)
1531 INTEGER IFUNC
1532
1533 CALL SYSTEM_CLOCK (CLOCKS (IFUNC (CFUNC ('('//A//')'))))
1534 @end smallexample
1535
1536 The above poses the following challenges to any Fortran compiler
1537 that uses run-time interfaces, and a run-time library, roughly similar
1538 to those used by @code{g77}:
1539
1540 @itemize @bullet
1541 @item
1542 Assuming the library routine that supports @code{SYSTEM_CLOCK}
1543 expects to set an @code{INTEGER*4} variable via its @code{COUNT} argument,
1544 the compiler must make available to it a temporary variable of that type.
1545
1546 @item
1547 Further, after the @code{SYSTEM_CLOCK} library routine returns,
1548 the compiler must ensure that the temporary variable it wrote
1549 is copied into the appropriate element of the @samp{CLOCKS} array.
1550 (This assumes the compiler doesn't just reject the code,
1551 which it should if it is compiling under some kind of a ``strict'' option.)
1552
1553 @item
1554 To determine the correct index into the @samp{CLOCKS} array,
1555 (putting aside the fact that the index, in this particular case,
1556 need not be computed until after
1557 the @code{SYSTEM_CLOCK} library routine returns),
1558 the compiler must ensure that the @code{IFUNC} function is called.
1559
1560 That requires evaluating its argument,
1561 which requires, for @code{g77}
1562 (assuming @code{-ff2c} is in force),
1563 reserving a temporary variable of type @code{COMPLEX}
1564 for use as a repository for the return value
1565 being computed by @samp{CFUNC}.
1566
1567 @item
1568 Before invoking @samp{CFUNC},
1569 is argument must be evaluated,
1570 which requires allocating, at run time,
1571 a temporary large enough to hold the result of the concatenation,
1572 as well as actually performing the concatenation.
1573
1574 @item
1575 The large temporary needed during invocation of @code{CFUNC}
1576 should, ideally, be deallocated
1577 (or, at least, left to the GBE to dispose of, as it sees fit)
1578 as soon as @code{CFUNC} returns,
1579 which means before @code{IFUNC} is called
1580 (as it might need a lot of dynamically allocated memory).
1581 @end itemize
1582
1583 @code{g77} currently doesn't support all of the above,
1584 but, so that it might someday, it has evolved to handle
1585 at least some of the above requirements.
1586
1587 Meeting the above requirements is made more challenging
1588 by conforming to the requirements of the GBEL/GBE combination.
1589
1590 @node Transforming Statements
1591 @section Transforming Statements
1592
1593 Most Fortran statements are given their own block,
1594 and, for temporary variables they might need, their own scope.
1595 (A block is what distinguishes @samp{@{ foo (); @}}
1596 from just @samp{foo ();} in C.
1597 A scope is included with every such block,
1598 providing a distinct name space for local variables.)
1599
1600 Label definitions for the statement precede this block,
1601 so @samp{10 PRINT *, I} is handled more like
1602 @samp{fl10: @{ @dots{} @}} than @samp{@{ fl10: @dots{} @}}
1603 (where @samp{fl10} is just a notation meaning ``Fortran Label 10''
1604 for the purposes of this document).
1605
1606 @menu
1607 * Statements Needing Temporaries::
1608 * Transforming DO WHILE::
1609 * Transforming Iterative DO::
1610 * Transforming Block IF::
1611 * Transforming SELECT CASE::
1612 @end menu
1613
1614 @node Statements Needing Temporaries
1615 @subsection Statements Needing Temporaries
1616
1617 Any temporaries needed during, but not beyond,
1618 execution of a Fortran statement,
1619 are made local to the scope of that statement's block.
1620
1621 This allows the GBE to share storage for these temporaries
1622 among the various statements without the FFE
1623 having to manage that itself.
1624
1625 (The GBE could, of course, decide to optimize
1626 management of these temporaries.
1627 For example, it could, theoretically,
1628 schedule some of the computations involving these temporaries
1629 to occur in parallel.
1630 More practically, it might leave the storage for some temporaries
1631 ``live'' beyond their scopes, to reduce the number of
1632 manipulations of the stack pointer at run time.)
1633
1634 Temporaries needed across distinct statement boundaries usually
1635 are associated with Fortran blocks (such as @code{DO}/@code{END DO}).
1636 (Also, there might be temporaries not associated with blocks at all---these
1637 would be in the scope of the entire program unit.)
1638
1639 Each Fortran block @emph{should} get its own block/scope in the GBE.
1640 This is best, because it allows temporaries to be more naturally handled.
1641 However, it might pose problems when handling labels
1642 (in particular, when they're the targets of @code{GOTO}s outside the Fortran
1643 block), and generally just hassling with replicating
1644 parts of the @code{gcc} front end
1645 (because the FFE needs to support
1646 an arbitrary number of nested back-end blocks
1647 if each Fortran block gets one).
1648
1649 So, there might still be a need for top-level temporaries, whose
1650 ``owning'' scope is that of the containing procedure.
1651
1652 Also, there seems to be problems declaring new variables after
1653 generating code (within a block) in the back end, leading to, e.g.,
1654 @samp{label not defined before binding contour} or similar messages,
1655 when compiling with @samp{-fstack-check} or
1656 when compiling for certain targets.
1657
1658 Because of that, and because sometimes these temporaries are not
1659 discovered until in the middle of of generating code for an expression
1660 statement (as in the case of the optimization for @samp{X**I}),
1661 it seems best to always
1662 pre-scan all the expressions that'll be expanded for a block
1663 before generating any of the code for that block.
1664
1665 This pre-scan then handles discovering and declaring, to the back end,
1666 the temporaries needed for that block.
1667
1668 It's also important to treat distinct items in an I/O list as distinct
1669 statements deserving their own blocks.
1670 That's because there's a requirement
1671 that each I/O item be fully processed before the next one,
1672 which matters in cases like @samp{READ (*,*), I, A(I)}---the
1673 element of @samp{A} read in the second item
1674 @emph{must} be determined from the value
1675 of @samp{I} read in the first item.
1676
1677 @node Transforming DO WHILE
1678 @subsection Transforming DO WHILE
1679
1680 @samp{DO WHILE(expr)} @emph{must} be implemented
1681 so that temporaries needed to evaluate @samp{expr}
1682 are generated just for the test, each time.
1683
1684 Consider how @samp{DO WHILE (A//B .NE. 'END'); @dots{}; END DO} is transformed:
1685
1686 @smallexample
1687 for (;;)
1688   @{
1689     int temp0;
1690
1691     @{
1692       char temp1[large];
1693
1694       libg77_catenate (temp1, a, b);
1695       temp0 = libg77_ne (temp1, 'END');
1696     @}
1697
1698     if (! temp0)
1699       break;
1700
1701     @dots{}
1702   @}
1703 @end smallexample
1704
1705 In this case, it seems like a time/space tradeoff
1706 between allocating and deallocating @samp{temp1} for each iteration
1707 and allocating it just once for the entire loop.
1708
1709 However, if @samp{temp1} is allocated just once for the entire loop,
1710 it could be the wrong size for subsequent iterations of that loop
1711 in cases like @samp{DO WHILE (A(I:J)//B .NE. 'END')},
1712 because the body of the loop might modify @samp{I} or @samp{J}.
1713
1714 So, the above implementation is used,
1715 though a more optimal one can be used
1716 in specific circumstances.
1717
1718 @node Transforming Iterative DO
1719 @subsection Transforming Iterative DO
1720
1721 An iterative @code{DO} loop
1722 (one that specifies an iteration variable)
1723 is required by the Fortran standards
1724 to be implemented as though an iteration count
1725 is computed before entering the loop body,
1726 and that iteration count used to determine
1727 the number of times the loop body is to be performed
1728 (assuming the loop isn't cut short via @code{GOTO} or @code{EXIT}).
1729
1730 The FFE handles this by allocating a temporary variable
1731 to contain the computed number of iterations.
1732 Since this variable must be in a scope that includes the entire loop,
1733 a GBEL block is created for that loop,
1734 and the variable declared as belonging to the scope of that block.
1735
1736 @node Transforming Block IF
1737 @subsection Transforming Block IF
1738
1739 Consider:
1740
1741 @smallexample
1742 SUBROUTINE X(A,B,C)
1743 CHARACTER*(*) A, B, C
1744 LOGICAL LFUNC
1745
1746 IF (LFUNC (A//B)) THEN
1747   CALL SUBR1
1748 ELSE IF (LFUNC (A//C)) THEN
1749   CALL SUBR2
1750 ELSE
1751   CALL SUBR3
1752 END
1753 @end smallexample
1754
1755 The arguments to the two calls to @samp{LFUNC}
1756 require dynamic allocation (at run time),
1757 but are not required during execution of the @code{CALL} statements.
1758
1759 So, the scopes of those temporaries must be within blocks inside
1760 the block corresponding to the Fortran @code{IF} block.
1761
1762 This cannot be represented ``naturally''
1763 in vanilla C, nor in GBEL.
1764 The @code{if}, @code{elseif}, @code{else},
1765 and @code{endif} constructs
1766 provided by both languages must,
1767 for a given @code{if} block,
1768 share the same C/GBE block.
1769
1770 Therefore, any temporaries needed during evaluation of @samp{expr}
1771 while executing @samp{ELSE IF(expr)}
1772 must either have been predeclared
1773 at the top of the corresponding @code{IF} block,
1774 or declared within a new block for that @code{ELSE IF}---a block that,
1775 since it cannot contain the @code{else} or @code{else if} itself
1776 (due to the above requirement),
1777 actually implements the rest of the @code{IF} block's
1778 @code{ELSE IF} and @code{ELSE} statements
1779 within an inner block.
1780
1781 The FFE takes the latter approach.
1782
1783 @node Transforming SELECT CASE
1784 @subsection Transforming SELECT CASE
1785
1786 @code{SELECT CASE} poses a few interesting problems for code generation,
1787 if efficiency and frugal stack management are important.
1788
1789 Consider @samp{SELECT CASE (I('PREFIX'//A))},
1790 where @samp{A} is @code{CHARACTER*(*)}.
1791 In a case like this---basically,
1792 in any case where largish temporaries are needed
1793 to evaluate the expression---those temporaries should
1794 not be ``live'' during execution of any of the @code{CASE} blocks.
1795
1796 So, evaluation of the expression is best done within its own block,
1797 which in turn is within the @code{SELECT CASE} block itself
1798 (which contains the code for the CASE blocks as well,
1799 though each within their own block).
1800
1801 Otherwise, we'd have the rough equivalent of this pseudo-code:
1802
1803 @smallexample
1804 @{
1805   char temp[large];
1806
1807   libg77_catenate (temp, 'prefix', a);
1808
1809   switch (i (temp))
1810     @{
1811     case 0:
1812       @dots{}
1813     @}
1814 @}
1815 @end smallexample
1816
1817 And that would leave temp[large] in scope during the CASE blocks
1818 (although a clever back end *could* see that it isn't referenced
1819 in them, and thus free that temp before executing the blocks).
1820
1821 So this approach is used instead:
1822
1823 @smallexample
1824 @{
1825   int temp0;
1826
1827   @{
1828     char temp1[large];
1829
1830     libg77_catenate (temp1, 'prefix', a);
1831     temp0 = i (temp1);
1832   @}
1833
1834   switch (temp0)
1835     @{
1836     case 0:
1837       @dots{}
1838     @}
1839 @}
1840 @end smallexample
1841
1842 Note how @samp{temp1} goes out of scope before starting the switch,
1843 thus making it easy for a back end to free it.
1844
1845 The problem @emph{that} solution has, however,
1846 is with @samp{SELECT CASE('prefix'//A)}
1847 (which is currently not supported).
1848
1849 Unless the GBEL is extended to support arbitrarily long character strings
1850 in its @code{case} facility,
1851 the FFE has to implement @code{SELECT CASE} on @code{CHARACTER}
1852 (probably excepting @code{CHARACTER*1})
1853 using a cascade of
1854 @code{if}, @code{elseif}, @code{else}, and @code{endif} constructs
1855 in GBEL.
1856
1857 To prevent the (potentially large) temporary,
1858 needed to hold the selected expression itself (@samp{'prefix'//A}),
1859 from being in scope during execution of the @code{CASE} blocks,
1860 two approaches are available:
1861
1862 @itemize @bullet
1863 @item
1864 Pre-evaluate all the @code{CASE} tests,
1865 producing an integer ordinal that is used,
1866 a la @samp{temp0} in the earlier example,
1867 as if @samp{SELECT CASE(temp0)} had been written.
1868
1869 Each corresponding @code{CASE} is replaced with @samp{CASE(@var{i})},
1870 where @var{i} is the ordinal for that case,
1871 determined while, or before,
1872 generating the cascade of @code{if}-related constructs
1873 to cope with @code{CHARACTER} selection.
1874
1875 @item
1876 Make @samp{temp0} above just
1877 large enough to hold the longest @code{CASE} string
1878 that'll actually be compared against the expression
1879 (in this case, @samp{'prefix'//A}).
1880
1881 Since that length must be constant
1882 (because @code{CASE} expressions are all constant),
1883 it won't be so large,
1884 and, further, @samp{temp1} need not be dynamically allocated,
1885 since normal @code{CHARACTER} assignment can be used
1886 into the fixed-length @samp{temp0}.
1887 @end itemize
1888
1889 Both of these solutions require @code{SELECT CASE} implementation
1890 to be changed so all the corresponding @code{CASE} statements
1891 are seen during the actual code generation for @code{SELECT CASE}.
1892
1893 @node Transforming Expressions
1894 @section Transforming Expressions
1895
1896 The interactions between statements, expressions, and subexpressions
1897 at program run time can be viewed as:
1898
1899 @smallexample
1900 @var{action}(@var{expr})
1901 @end smallexample
1902
1903 Here, @var{action} is the series of steps
1904 performed to effect the statement,
1905 and @var{expr} is the expression
1906 whose value is used by @var{action}.
1907
1908 Expanding the above shows a typical order of events at run time:
1909
1910 @smallexample
1911 Evaluate @var{expr}
1912 Perform @var{action}, using result of evaluation of @var{expr}
1913 Clean up after evaluating @var{expr}
1914 @end smallexample
1915
1916 So, if evaluating @var{expr} requires allocating memory,
1917 that memory can be freed before performing @var{action}
1918 only if it is not needed to hold the result of evaluating @var{expr}.
1919 Otherwise, it must be freed no sooner than
1920 after @var{action} has been performed.
1921
1922 The above are recursive definitions,
1923 in the sense that they apply to subexpressions of @var{expr}.
1924
1925 That is, evaluating @var{expr} involves
1926 evaluating all of its subexpressions,
1927 performing the @var{action} that computes the
1928 result value of @var{expr},
1929 then cleaning up after evaluating those subexpressions.
1930
1931 The recursive nature of this evaluation is implemented
1932 via recursive-descent transformation of the top-level statements,
1933 their expressions, @emph{their} subexpressions, and so on.
1934
1935 However, that recursive-descent transformation is,
1936 due to the nature of the GBEL,
1937 focused primarily on generating a @emph{single} stream of code
1938 to be executed at run time.
1939
1940 Yet, from the above, it's clear that multiple streams of code
1941 must effectively be simultaneously generated
1942 during the recursive-descent analysis of statements.
1943
1944 The primary stream implements the primary @var{action} items,
1945 while at least two other streams implement
1946 the evaluation and clean-up items.
1947
1948 Requirements imposed by expressions include:
1949
1950 @itemize @bullet
1951 @item
1952 Whether the caller needs to have a temporary ready
1953 to hold the value of the expression.
1954
1955 @item
1956 Other stuff???
1957 @end itemize
1958
1959 @node Internal Naming Conventions
1960 @section Internal Naming Conventions
1961
1962 Names exported by FFE modules have the following (regular-expression) forms.
1963 Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}},
1964 where @var{mod} is lowercase or uppercase alphanumerics, respectively,
1965 are exported by the module @code{ffe@var{mod}},
1966 with the source code doing the exporting in @file{@var{mod}.h}.
1967 (Usually, the source code for the implementation is in @file{@var{mod}.c}.)
1968
1969 Identifiers that don't fit the following forms
1970 are not considered exported,
1971 even if they are according to the C language.
1972 (For example, they might be made available to other modules
1973 solely for use within expansions of exported macros,
1974 not for use within any source code in those other modules.)
1975
1976 @table @code
1977 @item ffe@var{mod}
1978 The single typedef exported by the module.
1979
1980 @item FFE@var{umod}_[A-Z][A-Z0-9_]*
1981 (Where @var{umod} is the uppercase for of @var{mod}.)
1982
1983 A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}.
1984
1985 @item ffe@var{mod}[A-Z][A-Z][a-z0-9]*
1986 A typedef exported by the module.
1987
1988 The portion of the identifier after @code{ffe@var{mod}} is
1989 referred to as @code{ctype}, a capitalized (mixed-case) form
1990 of @code{type}.
1991
1992 @item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]?
1993 (Where @var{umod} is the uppercase for of @var{mod}.)
1994
1995 A @code{#define} or @code{enum} constant of the type
1996 @code{ffe@var{mod}@var{type}},
1997 where @var{type} is the lowercase form of @var{ctype}
1998 in an exported typedef.
1999
2000 @item ffe@var{mod}_@var{value}
2001 A function that does or returns something,
2002 as described by @var{value} (see below).
2003
2004 @item ffe@var{mod}_@var{value}_@var{input}
2005 A function that does or returns something based
2006 primarily on the thing described by @var{input} (see below).
2007 @end table
2008
2009 Below are names used for @var{value} and @var{input},
2010 along with their definitions.
2011
2012 @table @code
2013 @item col
2014 A column number within a line (first column is number 1).
2015
2016 @item file
2017 An encapsulation of a file's name.
2018
2019 @item find
2020 Looks up an instance of some type that matches specified criteria,
2021 and returns that, even if it has to create a new instance or
2022 crash trying to find it (as appropriate).
2023
2024 @item initialize
2025 Initializes, usually a module.  No type.
2026
2027 @item int
2028 A generic integer of type @code{int}.
2029
2030 @item is
2031 A generic integer that contains a true (non-zero) or false (zero) value.
2032
2033 @item len
2034 A generic integer that contains the length of something.
2035
2036 @item line
2037 A line number within a source file,
2038 or a global line number.
2039
2040 @item lookup
2041 Looks up an instance of some type that matches specified criteria,
2042 and returns that, or returns nil.
2043
2044 @item name
2045 A @code{text} that points to a name of something.
2046
2047 @item new
2048 Makes a new instance of the indicated type.
2049 Might return an existing one if appropriate---if so,
2050 similar to @code{find} without crashing.
2051
2052 @item pt
2053 Pointer to a particular character (line, column pairs)
2054 in the input file (source code being compiled).
2055
2056 @item run
2057 Performs some herculean task.  No type.
2058
2059 @item terminate
2060 Terminates, usually a module.  No type.
2061
2062 @item text
2063 A @code{char *} that points to generic text.
2064 @end table