To find about things that are ``To Be Determined'' or ``To Be Done'',
search for the string TBD.
If you want to help by working on one or more of these items,
-email me at @email{@value{email-burley}}.
+email @email{gcc@@gcc.gnu.org}.
If you're planning to do more than just research issues and offer comments,
-see @uref{http://egcs.cygnus.com/contribute.html} for steps you might
+see @uref{http://www.gnu.org/software/contribute.html} for steps you might
need to take first.
@menu
+* Overview of Sources::
* Overview of Translation Process::
* Philosophy of Code Generation::
* Two-pass Design::
* Challenges Posed::
* Transforming Statements::
* Transforming Expressions::
+* Internal Naming Conventions::
@end menu
+@node Overview of Sources
+@section Overview of Sources
+
+The current directory layout includes the following:
+
+@table @file
+@item @value{srcdir}/gcc/
+Non-g77 files in gcc
+
+@item @value{srcdir}/gcc/f/
+GNU Fortran front end sources
+
+@item @value{srcdir}/libf2c/
+@code{libg2c} configuration and @code{g2c.h} file generation
+
+@item @value{srcdir}/libf2c/libF77/
+General support and math portion of @code{libg2c}
+
+@item @value{srcdir}/libf2c/libI77/
+I/O portion of @code{libg2c}
+
+@item @value{srcdir}/libf2c/libU77/
+Additional interfaces to Unix @code{libc} for @code{libg2c}
+@end table
+
+Components of note in @code{g77} are described below.
+
+@file{f/} as a whole contains the source for @code{g77},
+while @file{libf2c/} contains a portion of the separate program
+@code{f2c}.
+Note that the @code{libf2c} code is not part of the program @code{g77},
+just distributed with it.
+
+@file{f/} contains text files that document the Fortran compiler, source
+files for the GNU Fortran Front End (FFE), and some other stuff.
+The @code{g77} compiler code is placed in @file{f/} because it,
+along with its contents,
+is designed to be a subdirectory of a @code{gcc} source directory,
+@file{gcc/},
+which is structured so that language-specific front ends can be ``dropped
+in'' as subdirectories.
+The C++ front end (@code{g++}), is an example of this---it resides in
+the @file{cp/} subdirectory.
+Note that the C front end (also referred to as @code{gcc})
+is an exception to this, as its source files reside
+in the @file{gcc/} directory itself.
+
+@file{libf2c/} contains the run-time libraries for the @code{f2c} program,
+also used by @code{g77}.
+These libraries normally referred to collectively as @code{libf2c}.
+When built as part of @code{g77},
+@code{libf2c} is installed under the name @code{libg2c} to avoid
+conflict with any existing version of @code{libf2c},
+and thus is often referred to as @code{libg2c} when the
+@code{g77} version is specifically being referred to.
+
+The @code{netlib} version of @code{libf2c/}
+contains two distinct libraries,
+@code{libF77} and @code{libI77},
+each in their own subdirectories.
+In @code{g77}, this distinction is not made,
+beyond maintaining the subdirectory structure in the source-code tree.
+
+@file{libf2c/} is not part of the program @code{g77},
+just distributed with it.
+It contains files not present
+in the official (@code{netlib}) version of @code{libf2c},
+and also contains some minor changes made from @code{libf2c},
+to fix some bugs,
+and to facilitate automatic configuration, building, and installation of
+@code{libf2c} (as @code{libg2c}) for use by @code{g77} users.
+See @file{libf2c/README} for more information,
+including licensing conditions
+governing distribution of programs containing code from @code{libg2c}.
+
+@code{libg2c}, @code{g77}'s version of @code{libf2c},
+adds Dave Love's implementation of @code{libU77},
+in the @file{libf2c/libU77/} directory.
+This library is distributed under the
+GNU Library General Public License (LGPL)---see the
+file @file{libf2c/libU77/COPYING.LIB}
+for more information,
+as this license
+governs distribution conditions for programs containing code
+from this portion of the library.
+
+Files of note in @file{f/} and @file{libf2c/} are described below:
+
+@table @file
+@item f/BUGS
+Lists some important bugs known to be in g77.
+Or use Info (or GNU Emacs Info mode) to read
+the ``Actual Bugs'' node of the @code{g77} documentation:
+
+@smallexample
+info -f f/g77.info -n "Actual Bugs"
+@end smallexample
+
+@item f/ChangeLog
+Lists recent changes to @code{g77} internals.
+
+@item libf2c/ChangeLog
+Lists recent changes to @code{libg2c} internals.
+
+@item f/NEWS
+Contains the per-release changes.
+These include the user-visible
+changes described in the node ``Changes''
+in the @code{g77} documentation, plus internal
+changes of import.
+Or use:
+
+@smallexample
+info -f f/g77.info -n News
+@end smallexample
+
+@item f/g77.info*
+The @code{g77} documentation, in Info format,
+produced by building @code{g77}.
+
+All users of @code{g77} (not just installers) should read this,
+using the @code{more} command if neither the @code{info} command,
+nor GNU Emacs (with its Info mode), are available, or if users
+aren't yet accustomed to using these tools.
+All of these files are readable as ``plain text'' files,
+though they're easier to navigate using Info readers
+such as @code{info} and GNU Emacs Info mode.
+@end table
+
+If you want to explore the FFE code, which lives entirely in @file{f/},
+here are a few clues.
+The file @file{g77spec.c} contains the @code{g77}-specific source code
+for the @code{g77} command only---this just forms a variant of the
+@code{gcc} command, so,
+just as the @code{gcc} command itself does not contain the C front end,
+the @code{g77} command does not contain the Fortran front end (FFE).
+The FFE code ends up in an executable named @file{f771},
+which does the actual compiling,
+so it contains the FFE plus the @code{gcc} back end (GBE),
+the latter to do most of the optimization, and the code generation.
+
+The file @file{parse.c} is the source file for @code{yyparse()},
+which is invoked by the GBE to start the compilation process,
+for @file{f771}.
+
+The file @file{top.c} contains the top-level FFE function @code{ffe_file}
+and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*},
+and @samp{FFE_[A-Za-z].*} symbols.
+
+The file @file{fini.c} is a @code{main()} program that is used when building
+the FFE to generate C header and source files for recognizing keywords.
+The files @file{malloc.c} and @file{malloc.h} comprise a memory manager
+that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and
+@samp{MALLOC_[A-Za-z].*} symbols.
+
+All other modules named @var{xyz}
+are comprised of all files named @samp{@var{xyz}*.@var{ext}}
+and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*},
+and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols.
+If you understand all this, congratulations---it's easier for me to remember
+how it works than to type in these regular expressions.
+But it does make it easy to find where a symbol is defined.
+For example, the symbol @samp{ffexyz_set_something} would be defined
+in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}.
+
+The ``porting'' files of note currently are:
+
+@table @file
+@item proj.c
+@itemx proj.h
+This defines the ``language'' used by all the other source files,
+the language being Standard C plus some useful things
+like @code{ARRAY_SIZE} and such.
+
+@item target.c
+@itemx target.h
+These describe the target machine
+in terms of what data types are supported,
+how they are denoted
+(to what C type does an @code{INTEGER*8} map, for example),
+how to convert between them,
+and so on.
+Over time, versions of @code{g77} rely less on this file
+and more on run-time configuration based on GBE info
+in @file{com.c}.
+
+@item com.c
+@itemx com.h
+These are the primary interface to the GBE.
+
+@item ste.c
+@itemx ste.h
+This contains code for implementing recognized executable statements
+in the GBE.
+
+@item src.c
+@itemx src.h
+These contain information on the format(s) of source files
+(such as whether they are never to be processed as case-insensitive
+with regard to Fortran keywords).
+@end table
+
+If you want to debug the @file{f771} executable,
+for example if it crashes,
+note that the global variables @code{lineno} and @code{input_filename}
+are usually set to reflect the current line being read by the lexer
+during the first-pass analysis of a program unit and to reflect
+the current line being processed during the second-pass compilation
+of a program unit.
+
+If an invocation of the function @code{ffestd_exec_end} is on the stack,
+the compiler is in the second pass, otherwise it is in the first.
+
+(This information might help you reduce a test case and/or work around
+a bug in @code{g77} until a fix is available.)
+
@node Overview of Translation Process
@section Overview of Translation Process
Stand-alone statement identification (@file{sta.c})
@item
+INCLUDE handling (@file{sti.c})
+
+@item
+Order-dependent statement identification (@file{stq.c})
+
+@item
Parsing (@file{stb.c} and @file{expr.c})
@item
the first must represent an array for this to be an assignment statement,
else it's a statement function.
-Either way, @file{sta.c} hands off the statement to @file{stb.c}
+Either way, @file{sta.c} hands off the statement to @file{stq.c}
+(via @file{sti.c}, which expands INCLUDE files).
+@file{stq.c} figures out what a statement that is,
+on its own, ambiguous, must actually be based on the context
+established by previous statements.
+
+So, @file{stq.c} watches the statement stream for executable statements,
+END statements, and so on, so it knows whether @samp{A(B)=C} is
+(intended as) a statement-function definition or an assignment statement.
+
+After establishing the context-aware statement info, @file{stq.c}
+passes the original sample statement on to @file{stb.c}
(either its statement-function parser or its assignment-statement parser).
@file{stb.c} forms a
* g77stripcard::
* lex.c::
* sta.c::
+* sti.c::
+* stq.c::
* stb.c::
* expr.c::
* stc.c::
host products a better infrastructure in which to fit and coexist
peacefully.)
+@code{g77stripcard} probably shouldn't do any tab expansion or other
+fancy stuff.
+People can use @code{expand} or other pre-filtering if they like.
+The idea here is to keep each stage quite simple, while providing
+excellent performance for ``normal'' code.
+
+(Code with junk beyond column 73 is not really ``normal'',
+as it comes from a card-punch heritage,
+and will be increasingly hard for tomorrow's Fortran programmers to read.)
+
@node lex.c
@subsection lex.c
where dividing @samp{(@var{n} - 1)} by eight
results in a remainder of zero.
+That saves having to pass most source files through @code{expand}.
+
@item
Linefeeds (ASCII code 10)
mark the ends of lines.
Then @code{g77} wouldn't have to define a prefix syntax for Hollerith
constants specifying whether they want C-style or straight-through
backslashes.
+
+@item
+To allow for form-neutral INCLUDE files without requiring them
+to be preprocessed,
+the fixed-form lexer should offer an extension (if possible)
+allowing a trailing @samp{&} to be ignored, especially if after
+column 72, as it would be using the traditional Unix Fortran source
+model (which ignores @emph{everything} after column 72).
@end itemize
The above implements nearly exactly what is specified by
and
@ref{Lines},
except it also provides automatic conversion of tabs
-and ignoring of newline-related carriage returns.
+and ignoring of newline-related carriage returns,
+as well as accommodating form-neutral INCLUDE files.
-It also effects the ``pure visual'' model,
+It also implements the ``pure visual'' model,
by which is meant that a user viewing his code
in a typical text editor
(assuming it's not preprocessed via @code{g77stripcard} or similar)
Any language or user-visible source form
requiring special tagging of tabs,
the ends of lines after spaces/tabs,
-and so on, is broken by this definition.
-Fortunately, Fortran @emph{itself} is not broken,
-even if most vendor-supplied defaults for their Fortran compilers @emph{are}
-in this regard.)
+and so on, fails to meet this fairly straightforward specification.
+Fortunately, Fortran @emph{itself} does not mandate such a failure,
+though most vendor-supplied defaults for their Fortran compilers @emph{do}
+fail to meet this specification for readability.)
Further, this model provides a clean interface
to whatever preprocessors or code-generators are used
@node sta.c
@subsection sta.c
+@node sti.c
+@subsection sti.c
+
+@node stq.c
+@subsection stq.c
+
@node stb.c
@subsection stb.c
(This is a horrible misfeature of the Fortran 90 language.
It's one of many such misfeatures that almost make me want
-to not support them, and forge ahead with designing a true
+to not support them, and forge ahead with designing a new
``GNU Fortran'' language that has the features,
-without the misfeatures, of Fortran 90,
-and provide programs to do the conversion automatically.)
+but not the misfeatures, of Fortran 90,
+and provide utility programs to do the conversion automatically.)
So, the lexer must gather distinct chunks of decimal strings into
a single lexeme in contexts where a single decimal lexeme might
start a Hollerith constant.
-(Which means it might as well do that all the time.)
+
+(Which probably means it might as well do that all the time
+for all multi-character lexemes, even in free-form mode,
+leaving it to subsequent phases to pull them apart as they see fit.)
Compare the treatment of this to how
must be treated---the former must be diagnosed, due to the separation
between lexemes, the latter must be accepted as a proper declaration.
+@subsubsection Hollerith Constants
+
+Recognizing a Hollerith constant---specifically,
+that an @samp{H} or @samp{h} after a digit string begins
+such a constant---requires some knowledge of context.
+
+Hollerith constants (such as @samp{2HAB}) can appear after:
+
+@itemize @bullet
+@item
+@samp{(}
+
+@item
+@samp{,}
+
+@item
+@samp{=}
+
+@item
+@samp{+}, @samp{-}, @samp{/}
+
+@item
+@samp{*}, except as noted below
+@end itemize
+
+Hollerith constants don't appear after:
+
+@itemize @bullet
+@item
+@samp{CHARACTER*},
+which can be treated generally as
+any @samp{*} that is the second lexeme of a statement
+@end itemize
+
+@subsubsection Confusing Function Keyword
+
+While
+
+@smallexample
+REAL FUNCTION FOO ()
+@end smallexample
+
+must be a @code{FUNCTION} statement and
+
+@smallexample
+REAL FUNCTION FOO (5)
+@end smallexample
+
+must be a type-definition statement,
+
+@smallexample
+REAL FUNCTION FOO (@var{names})
+@end smallexample
+
+where @var{names} is a comma-separated list of names,
+can be one or the other.
+
+The only way to disambiguate that statement
+(short of mandating free-form source or a short maximum
+length for name for external procedures)
+is based on the context of the statement.
+
+In particular, the statement is known to be within an
+already-started program unit
+(but not at the outer level of the @code{CONTAINS} block),
+it is a type-declaration statement.
+
+Otherwise, the statement is a @code{FUNCTION} statement,
+in that it begins a function program unit
+(external, or, within @code{CONTAINS}, nested).
+
+@subsubsection Weird READ
+
+The statement
+
+@smallexample
+READ (N)
+@end smallexample
+
+is equivalent to either
+
+@smallexample
+READ (UNIT=(N))
+@end smallexample
+
+or
+
+@smallexample
+READ (FMT=(N))
+@end smallexample
+
+depending on which would be valid in context.
+
+Specifically, if @samp{N} is type @code{INTEGER},
+@samp{READ (FMT=(N))} would not be valid,
+because parentheses may not be used around @samp{N},
+whereas they may around it in @samp{READ (UNIT=(N))}.
+
+Further, if @samp{N} is type @code{CHARACTER},
+the opposite is true---@samp{READ (UNIT=(N))} is not valid,
+but @samp{READ (FMT=(N))} is.
+
+Strictly speaking, if anything follows
+
+@smallexample
+READ (N)
+@end smallexample
+
+in the statement, whether the first lexeme after the close
+parenthese is a comma could be used to disambiguate the two cases,
+without looking at the type of @samp{N},
+because the comma is required for the @samp{READ (FMT=(N))}
+interpretation and disallowed for the @samp{READ (UNIT=(N))}
+interpretation.
+
+However, in practice, many Fortran compilers allow
+the comma for the @samp{READ (UNIT=(N))}
+interpretation anyway
+(in that they generally allow a leading comma before
+an I/O list in an I/O statement),
+and much code takes advantage of this allowance.
+
+(This is quite a reasonable allowance, since the
+juxtaposition of a comma-separated list immediately
+after an I/O control-specification list, which is also comma-separated,
+without an intervening comma,
+looks sufficiently ``wrong'' to programmers
+that they can't resist the itch to insert the comma.
+@samp{READ (I, J), K, L} simply looks cleaner than
+@samp{READ (I, J) K, L}.)
+
+So, type-based disambiguation is needed unless strict adherence
+to the standard is always assumed, and we're not going to assume that.
+
@node TBD (Transforming)
@subsection TBD (Transforming)
@itemize @bullet
@item
-Just where should @code{INCLUDE} processing take place?
-
-Clearly before (or part of) statement identification (@file{sta.c}),
-since determining whether @samp{I(J)=K} is a statement-function
-definition or an assignment statement requires knowing the context,
-which in turn requires having processed @code{INCLUDE} files.
-
-@item
Just where should (if it was implemented) @code{USE} processing take place?
This gets into the whole issue of how @code{g77} should handle the concept
I've asked @email{info-gnu-fortran@@gnu.org} for input on this.
Not having to support these makes it easier to write the new front end,
and might also avoid complicated its design.
+
+The consensus to date (1999-11-17) has been to drop this support.
+Can't recall anybody saying they're using it, in fact.
@end itemize
@node Philosophy of Code Generation
The FFE was changed back to default to using that native facility,
leaving emulation as an option.
+Later during the release cycle
+(which was called EGCS 1.2, but soon became GCC 2.95),
+bugs in the native facility were found.
+Reactions among various people included
+``the last thing we should do is change the default back'',
+``we must change the default back'',
+and ``let's figure out whether we can narrow down the bugs to
+few enough cases to allow the now-months-long-tested default
+to remain the same''.
+The latter viewpoint won that particular time.
+The bugs exposed other concerns regarding ABI compliance
+when the ABI specified treatment of complex data as different
+from treatment of what Fortran and GNU C consider the equivalent
+aggregation (structure) of real (or float) pairs.
+
Other Fortran constructs---arrays, character strings,
complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates,
and so on---involve issues similar to those pertaining to complex arithmetic.
Both this mythical, and today's real, GBE caters to its GBEL
by, sometimes, scrambling around, cleaning up after itself---after
discovering that assumptions it made earlier during code generation
-are incorrect.)
+are incorrect.
+That's not a great design, since it indicates significant code
+paths that might be rarely tested but used in some key production
+environments.)
So, the FFE handles these discrepancies---between the order in which
it discovers facts about the code it is compiling,
the compiler must ensure that the temporary variable it wrote
is copied into the appropriate element of the @samp{CLOCKS} array.
(This assumes the compiler doesn't just reject the code,
-which it should if it is compiling under some kind of a "strict" option.)
+which it should if it is compiling under some kind of a ``strict'' option.)
@item
To determine the correct index into the @samp{CLOCKS} array,
@item
Other stuff???
@end itemize
+
+@node Internal Naming Conventions
+@section Internal Naming Conventions
+
+Names exported by FFE modules have the following (regular-expression) forms.
+Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}},
+where @var{mod} is lowercase or uppercase alphanumerics, respectively,
+are exported by the module @code{ffe@var{mod}},
+with the source code doing the exporting in @file{@var{mod}.h}.
+(Usually, the source code for the implementation is in @file{@var{mod}.c}.)
+
+Identifiers that don't fit the following forms
+are not considered exported,
+even if they are according to the C language.
+(For example, they might be made available to other modules
+solely for use within expansions of exported macros,
+not for use within any source code in those other modules.)
+
+@table @code
+@item ffe@var{mod}
+The single typedef exported by the module.
+
+@item FFE@var{umod}_[A-Z][A-Z0-9_]*
+(Where @var{umod} is the uppercase for of @var{mod}.)
+
+A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}.
+
+@item ffe@var{mod}[A-Z][A-Z][a-z0-9]*
+A typedef exported by the module.
+
+The portion of the identifier after @code{ffe@var{mod}} is
+referred to as @code{ctype}, a capitalized (mixed-case) form
+of @code{type}.
+
+@item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]?
+(Where @var{umod} is the uppercase for of @var{mod}.)
+
+A @code{#define} or @code{enum} constant of the type
+@code{ffe@var{mod}@var{type}},
+where @var{type} is the lowercase form of @var{ctype}
+in an exported typedef.
+
+@item ffe@var{mod}_@var{value}
+A function that does or returns something,
+as described by @var{value} (see below).
+
+@item ffe@var{mod}_@var{value}_@var{input}
+A function that does or returns something based
+primarily on the thing described by @var{input} (see below).
+@end table
+
+Below are names used for @var{value} and @var{input},
+along with their definitions.
+
+@table @code
+@item col
+A column number within a line (first column is number 1).
+
+@item file
+An encapsulation of a file's name.
+
+@item find
+Looks up an instance of some type that matches specified criteria,
+and returns that, even if it has to create a new instance or
+crash trying to find it (as appropriate).
+
+@item initialize
+Initializes, usually a module. No type.
+
+@item int
+A generic integer of type @code{int}.
+
+@item is
+A generic integer that contains a true (non-zero) or false (zero) value.
+
+@item len
+A generic integer that contains the length of something.
+
+@item line
+A line number within a source file,
+or a global line number.
+
+@item lookup
+Looks up an instance of some type that matches specified criteria,
+and returns that, or returns nil.
+
+@item name
+A @code{text} that points to a name of something.
+
+@item new
+Makes a new instance of the indicated type.
+Might return an existing one if appropriate---if so,
+similar to @code{find} without crashing.
+
+@item pt
+Pointer to a particular character (line, column pairs)
+in the input file (source code being compiled).
+
+@item run
+Performs some herculean task. No type.
+
+@item terminate
+Terminates, usually a module. No type.
+
+@item text
+A @code{char *} that points to generic text.
+@end table