* update_web_docs (PREPROCESS): Rename to WWWPREPROCESS.

[pf3gnuchains/gcc-fork.git] / gcc / f / ffe.texi
diff --git a/gcc/f/ffe.texi b/gcc/f/ffe.texi

index 40dc943..8e019fa 100644 (file)
--- a/gcc/f/ffe.texi
+++ b/gcc/f/ffe.texi
@@ -19,20 +19,238 @@ as of late May, 1999.
  To find about things that are ``To Be Determined'' or ``To Be Done'',
  search for the string TBD.
  If you want to help by working on one or more of these items,
-email me at @email{@value{email-burley}}.
+email @email{gcc@@gcc.gnu.org}.
  If you're planning to do more than just research issues and offer comments,
-see @uref{http://egcs.cygnus.com/contribute.html} for steps you might
+see @uref{http://www.gnu.org/software/contribute.html} for steps you might
  need to take first.
  
  @menu
+* Overview of Sources::
  * Overview of Translation Process::
  * Philosophy of Code Generation::
  * Two-pass Design::
  * Challenges Posed::
  * Transforming Statements::
  * Transforming Expressions::
+* Internal Naming Conventions::
  @end menu
  
+@node Overview of Sources
+@section Overview of Sources
+
+The current directory layout includes the following:
+
+@table @file
+@item @value{srcdir}/gcc/
+Non-g77 files in gcc
+
+@item @value{srcdir}/gcc/f/
+GNU Fortran front end sources
+
+@item @value{srcdir}/libf2c/
+@code{libg2c} configuration and @code{g2c.h} file generation
+
+@item @value{srcdir}/libf2c/libF77/
+General support and math portion of @code{libg2c}
+
+@item @value{srcdir}/libf2c/libI77/
+I/O portion of @code{libg2c}
+
+@item @value{srcdir}/libf2c/libU77/
+Additional interfaces to Unix @code{libc} for @code{libg2c}
+@end table
+
+Components of note in @code{g77} are described below.
+
+@file{f/} as a whole contains the source for @code{g77},
+while @file{libf2c/} contains a portion of the separate program
+@code{f2c}.
+Note that the @code{libf2c} code is not part of the program @code{g77},
+just distributed with it.
+
+@file{f/} contains text files that document the Fortran compiler, source
+files for the GNU Fortran Front End (FFE), and some other stuff.
+The @code{g77} compiler code is placed in @file{f/} because it,
+along with its contents,
+is designed to be a subdirectory of a @code{gcc} source directory,
+@file{gcc/},
+which is structured so that language-specific front ends can be ``dropped
+in'' as subdirectories.
+The C++ front end (@code{g++}), is an example of this---it resides in
+the @file{cp/} subdirectory.
+Note that the C front end (also referred to as @code{gcc})
+is an exception to this, as its source files reside
+in the @file{gcc/} directory itself.
+
+@file{libf2c/} contains the run-time libraries for the @code{f2c} program,
+also used by @code{g77}.
+These libraries normally referred to collectively as @code{libf2c}.
+When built as part of @code{g77},
+@code{libf2c} is installed under the name @code{libg2c} to avoid
+conflict with any existing version of @code{libf2c},
+and thus is often referred to as @code{libg2c} when the
+@code{g77} version is specifically being referred to.
+
+The @code{netlib} version of @code{libf2c/}
+contains two distinct libraries,
+@code{libF77} and @code{libI77},
+each in their own subdirectories.
+In @code{g77}, this distinction is not made,
+beyond maintaining the subdirectory structure in the source-code tree.
+
+@file{libf2c/} is not part of the program @code{g77},
+just distributed with it.
+It contains files not present
+in the official (@code{netlib}) version of @code{libf2c},
+and also contains some minor changes made from @code{libf2c},
+to fix some bugs,
+and to facilitate automatic configuration, building, and installation of
+@code{libf2c} (as @code{libg2c}) for use by @code{g77} users.
+See @file{libf2c/README} for more information,
+including licensing conditions
+governing distribution of programs containing code from @code{libg2c}.
+
+@code{libg2c}, @code{g77}'s version of @code{libf2c},
+adds Dave Love's implementation of @code{libU77},
+in the @file{libf2c/libU77/} directory.
+This library is distributed under the
+GNU Library General Public License (LGPL)---see the
+file @file{libf2c/libU77/COPYING.LIB}
+for more information,
+as this license
+governs distribution conditions for programs containing code
+from this portion of the library.
+
+Files of note in @file{f/} and @file{libf2c/} are described below:
+
+@table @file
+@item f/BUGS
+Lists some important bugs known to be in g77.
+Or use Info (or GNU Emacs Info mode) to read
+the ``Actual Bugs'' node of the @code{g77} documentation:
+
+@smallexample
+info -f f/g77.info -n "Actual Bugs"
+@end smallexample
+
+@item f/ChangeLog
+Lists recent changes to @code{g77} internals.
+
+@item libf2c/ChangeLog
+Lists recent changes to @code{libg2c} internals.
+
+@item f/NEWS
+Contains the per-release changes.
+These include the user-visible
+changes described in the node ``Changes''
+in the @code{g77} documentation, plus internal
+changes of import.
+Or use:
+
+@smallexample
+info -f f/g77.info -n News
+@end smallexample
+
+@item f/g77.info*
+The @code{g77} documentation, in Info format,
+produced by building @code{g77}.
+
+All users of @code{g77} (not just installers) should read this,
+using the @code{more} command if neither the @code{info} command,
+nor GNU Emacs (with its Info mode), are available, or if users
+aren't yet accustomed to using these tools.
+All of these files are readable as ``plain text'' files,
+though they're easier to navigate using Info readers
+such as @code{info} and GNU Emacs Info mode.
+@end table
+
+If you want to explore the FFE code, which lives entirely in @file{f/},
+here are a few clues.
+The file @file{g77spec.c} contains the @code{g77}-specific source code
+for the @code{g77} command only---this just forms a variant of the
+@code{gcc} command, so,
+just as the @code{gcc} command itself does not contain the C front end,
+the @code{g77} command does not contain the Fortran front end (FFE).
+The FFE code ends up in an executable named @file{f771},
+which does the actual compiling,
+so it contains the FFE plus the @code{gcc} back end (GBE),
+the latter to do most of the optimization, and the code generation.
+
+The file @file{parse.c} is the source file for @code{yyparse()},
+which is invoked by the GBE to start the compilation process,
+for @file{f771}.
+
+The file @file{top.c} contains the top-level FFE function @code{ffe_file}
+and it (along with top.h) define all @samp{ffe_[a-z].*}, @samp{ffe[A-Z].*},
+and @samp{FFE_[A-Za-z].*} symbols.
+
+The file @file{fini.c} is a @code{main()} program that is used when building
+the FFE to generate C header and source files for recognizing keywords.
+The files @file{malloc.c} and @file{malloc.h} comprise a memory manager
+that defines all @samp{malloc_[a-z].*}, @samp{malloc[A-Z].*}, and
+@samp{MALLOC_[A-Za-z].*} symbols.
+
+All other modules named @var{xyz}
+are comprised of all files named @samp{@var{xyz}*.@var{ext}}
+and define all @samp{ffe@var{xyz}_[a-z].*}, @samp{ffe@var{xyz}[A-Z].*},
+and @samp{FFE@var{XYZ}_[A-Za-z].*} symbols.
+If you understand all this, congratulations---it's easier for me to remember
+how it works than to type in these regular expressions.
+But it does make it easy to find where a symbol is defined.
+For example, the symbol @samp{ffexyz_set_something} would be defined
+in @file{xyz.h} and implemented there (if it's a macro) or in @file{xyz.c}.
+
+The ``porting'' files of note currently are:
+
+@table @file
+@item proj.c
+@itemx proj.h
+This defines the ``language'' used by all the other source files,
+the language being Standard C plus some useful things
+like @code{ARRAY_SIZE} and such.
+
+@item target.c
+@itemx target.h
+These describe the target machine
+in terms of what data types are supported,
+how they are denoted
+(to what C type does an @code{INTEGER*8} map, for example),
+how to convert between them,
+and so on.
+Over time, versions of @code{g77} rely less on this file
+and more on run-time configuration based on GBE info
+in @file{com.c}.
+
+@item com.c
+@itemx com.h
+These are the primary interface to the GBE.
+
+@item ste.c
+@itemx ste.h
+This contains code for implementing recognized executable statements
+in the GBE.
+
+@item src.c
+@itemx src.h
+These contain information on the format(s) of source files
+(such as whether they are never to be processed as case-insensitive
+with regard to Fortran keywords).
+@end table
+
+If you want to debug the @file{f771} executable,
+for example if it crashes,
+note that the global variables @code{lineno} and @code{input_filename}
+are usually set to reflect the current line being read by the lexer
+during the first-pass analysis of a program unit and to reflect
+the current line being processed during the second-pass compilation
+of a program unit.
+
+If an invocation of the function @code{ffestd_exec_end} is on the stack,
+the compiler is in the second pass, otherwise it is in the first.
+
+(This information might help you reduce a test case and/or work around
+a bug in @code{g77} until a fix is available.)
+
  @node Overview of Translation Process
  @section Overview of Translation Process
  
@@ -50,6 +268,12 @@ Lexing (@file{lex.c})
  Stand-alone statement identification (@file{sta.c})
  
  @item
+INCLUDE handling (@file{sti.c})
+
+@item
+Order-dependent statement identification (@file{stq.c})
+
+@item
  Parsing (@file{stb.c} and @file{expr.c})
  
  @item
@@ -111,7 +335,18 @@ Since the second lexeme is @samp{(},
  the first must represent an array for this to be an assignment statement,
  else it's a statement function.
  
-Either way, @file{sta.c} hands off the statement to @file{stb.c}
+Either way, @file{sta.c} hands off the statement to @file{stq.c}
+(via @file{sti.c}, which expands INCLUDE files).
+@file{stq.c} figures out what a statement that is,
+on its own, ambiguous, must actually be based on the context
+established by previous statements.
+
+So, @file{stq.c} watches the statement stream for executable statements,
+END statements, and so on, so it knows whether @samp{A(B)=C} is
+(intended as) a statement-function definition or an assignment statement.
+
+After establishing the context-aware statement info, @file{stq.c}
+passes the original sample statement on to @file{stb.c}
  (either its statement-function parser or its assignment-statement parser).
  
  @file{stb.c} forms a
@@ -161,6 +396,8 @@ decimal numbering is used, and so on.
  * g77stripcard::
  * lex.c::
  * sta.c::
+* sti.c::
+* stq.c::
  * stb.c::
  * expr.c::
  * stc.c::
@@ -217,6 +454,16 @@ one that looks nothing like the others, but which offers their
  host products a better infrastructure in which to fit and coexist
  peacefully.)
  
+@code{g77stripcard} probably shouldn't do any tab expansion or other
+fancy stuff.
+People can use @code{expand} or other pre-filtering if they like.
+The idea here is to keep each stage quite simple, while providing
+excellent performance for ``normal'' code.
+
+(Code with junk beyond column 73 is not really ``normal'',
+as it comes from a card-punch heritage,
+and will be increasingly hard for tomorrow's Fortran programmers to read.)
+
  @node lex.c
  @subsection lex.c
  
@@ -299,6 +546,8 @@ as necessary to reach column @var{n},
  where dividing @samp{(@var{n} - 1)} by eight
  results in a remainder of zero.
  
+That saves having to pass most source files through @code{expand}.
+
  @item
  Linefeeds (ASCII code 10)
  mark the ends of lines.
@@ -403,6 +652,14 @@ to the appropriate @code{CHARACTER} constants.
  Then @code{g77} wouldn't have to define a prefix syntax for Hollerith
  constants specifying whether they want C-style or straight-through
  backslashes.
+
+@item
+To allow for form-neutral INCLUDE files without requiring them
+to be preprocessed,
+the fixed-form lexer should offer an extension (if possible)
+allowing a trailing @samp{&} to be ignored, especially if after
+column 72, as it would be using the traditional Unix Fortran source
+model (which ignores @emph{everything} after column 72).
  @end itemize
  
  The above implements nearly exactly what is specified by
@@ -410,9 +667,10 @@ The above implements nearly exactly what is specified by
  and
  @ref{Lines},
  except it also provides automatic conversion of tabs
-and ignoring of newline-related carriage returns.
+and ignoring of newline-related carriage returns,
+as well as accommodating form-neutral INCLUDE files.
  
-It also effects the ``pure visual'' model,
+It also implements the ``pure visual'' model,
  by which is meant that a user viewing his code
  in a typical text editor
  (assuming it's not preprocessed via @code{g77stripcard} or similar)
@@ -444,10 +702,10 @@ the GNU Fortran ``pure visual'' model meets these requirements.
  Any language or user-visible source form
  requiring special tagging of tabs,
  the ends of lines after spaces/tabs,
-and so on, is broken by this definition.
-Fortunately, Fortran @emph{itself} is not broken,
-even if most vendor-supplied defaults for their Fortran compilers @emph{are}
-in this regard.)
+and so on, fails to meet this fairly straightforward specification.
+Fortunately, Fortran @emph{itself} does not mandate such a failure,
+though most vendor-supplied defaults for their Fortran compilers @emph{do}
+fail to meet this specification for readability.)
  
  Further, this model provides a clean interface
  to whatever preprocessors or code-generators are used
@@ -457,6 +715,12 @@ Mainly, they need not worry about long lines.
  @node sta.c
  @subsection sta.c
  
+@node sti.c
+@subsection sti.c
+
+@node stq.c
+@subsection stq.c
+
  @node stb.c
  @subsection stb.c
  
@@ -588,15 +852,18 @@ a single lexeme.
  
  (This is a horrible misfeature of the Fortran 90 language.
  It's one of many such misfeatures that almost make me want
-to not support them, and forge ahead with designing a true
+to not support them, and forge ahead with designing a new
  ``GNU Fortran'' language that has the features,
-without the misfeatures, of Fortran 90,
-and provide programs to do the conversion automatically.)
+but not the misfeatures, of Fortran 90,
+and provide utility programs to do the conversion automatically.)
  
  So, the lexer must gather distinct chunks of decimal strings into
  a single lexeme in contexts where a single decimal lexeme might
  start a Hollerith constant.
-(Which means it might as well do that all the time.)
+
+(Which probably means it might as well do that all the time
+for all multi-character lexemes, even in free-form mode,
+leaving it to subsequent phases to pull them apart as they see fit.)
  
  Compare the treatment of this to how
  
@@ -613,6 +880,140 @@ CHARACTER * 12 HEY
  must be treated---the former must be diagnosed, due to the separation
  between lexemes, the latter must be accepted as a proper declaration.
  
+@subsubsection Hollerith Constants
+
+Recognizing a Hollerith constant---specifically,
+that an @samp{H} or @samp{h} after a digit string begins
+such a constant---requires some knowledge of context.
+
+Hollerith constants (such as @samp{2HAB}) can appear after:
+
+@itemize @bullet
+@item
+@samp{(}
+
+@item
+@samp{,}
+
+@item
+@samp{=}
+
+@item
+@samp{+}, @samp{-}, @samp{/}
+
+@item
+@samp{*}, except as noted below
+@end itemize
+
+Hollerith constants don't appear after:
+
+@itemize @bullet
+@item
+@samp{CHARACTER*},
+which can be treated generally as
+any @samp{*} that is the second lexeme of a statement
+@end itemize
+
+@subsubsection Confusing Function Keyword
+
+While
+
+@smallexample
+REAL FUNCTION FOO ()
+@end smallexample
+
+must be a @code{FUNCTION} statement and
+
+@smallexample
+REAL FUNCTION FOO (5)
+@end smallexample
+
+must be a type-definition statement,
+
+@smallexample
+REAL FUNCTION FOO (@var{names})
+@end smallexample
+
+where @var{names} is a comma-separated list of names,
+can be one or the other.
+
+The only way to disambiguate that statement
+(short of mandating free-form source or a short maximum
+length for name for external procedures)
+is based on the context of the statement.
+
+In particular, the statement is known to be within an
+already-started program unit
+(but not at the outer level of the @code{CONTAINS} block),
+it is a type-declaration statement.
+
+Otherwise, the statement is a @code{FUNCTION} statement,
+in that it begins a function program unit
+(external, or, within @code{CONTAINS}, nested).
+
+@subsubsection Weird READ
+
+The statement
+
+@smallexample
+READ (N)
+@end smallexample
+
+is equivalent to either
+
+@smallexample
+READ (UNIT=(N))
+@end smallexample
+
+or
+
+@smallexample
+READ (FMT=(N))
+@end smallexample
+
+depending on which would be valid in context.
+
+Specifically, if @samp{N} is type @code{INTEGER},
+@samp{READ (FMT=(N))} would not be valid,
+because parentheses may not be used around @samp{N},
+whereas they may around it in @samp{READ (UNIT=(N))}.
+
+Further, if @samp{N} is type @code{CHARACTER},
+the opposite is true---@samp{READ (UNIT=(N))} is not valid,
+but @samp{READ (FMT=(N))} is.
+
+Strictly speaking, if anything follows
+
+@smallexample
+READ (N)
+@end smallexample
+
+in the statement, whether the first lexeme after the close
+parenthese is a comma could be used to disambiguate the two cases,
+without looking at the type of @samp{N},
+because the comma is required for the @samp{READ (FMT=(N))}
+interpretation and disallowed for the @samp{READ (UNIT=(N))}
+interpretation.
+
+However, in practice, many Fortran compilers allow
+the comma for the @samp{READ (UNIT=(N))}
+interpretation anyway
+(in that they generally allow a leading comma before
+an I/O list in an I/O statement),
+and much code takes advantage of this allowance.
+
+(This is quite a reasonable allowance, since the
+juxtaposition of a comma-separated list immediately
+after an I/O control-specification list, which is also comma-separated,
+without an intervening comma,
+looks sufficiently ``wrong'' to programmers
+that they can't resist the itch to insert the comma.
+@samp{READ (I, J), K, L} simply looks cleaner than
+@samp{READ (I, J) K, L}.)
+
+So, type-based disambiguation is needed unless strict adherence
+to the standard is always assumed, and we're not going to assume that.
+
  @node TBD (Transforming)
  @subsection TBD (Transforming)
  
@@ -623,14 +1024,6 @@ Specific issues to resolve:
  
  @itemize @bullet
  @item
-Just where should @code{INCLUDE} processing take place?
-
-Clearly before (or part of) statement identification (@file{sta.c}),
-since determining whether @samp{I(J)=K} is a statement-function
-definition or an assignment statement requires knowing the context,
-which in turn requires having processed @code{INCLUDE} files.
-
-@item
  Just where should (if it was implemented) @code{USE} processing take place?
  
  This gets into the whole issue of how @code{g77} should handle the concept
@@ -685,6 +1078,9 @@ and @samp{-fcase-initcap} options?
  I've asked @email{info-gnu-fortran@@gnu.org} for input on this.
  Not having to support these makes it easier to write the new front end,
  and might also avoid complicated its design.
+
+The consensus to date (1999-11-17) has been to drop this support.
+Can't recall anybody saying they're using it, in fact.
  @end itemize
  
  @node Philosophy of Code Generation
@@ -838,6 +1234,21 @@ were worked out.
  The FFE was changed back to default to using that native facility,
  leaving emulation as an option.
  
+Later during the release cycle
+(which was called EGCS 1.2, but soon became GCC 2.95),
+bugs in the native facility were found.
+Reactions among various people included
+``the last thing we should do is change the default back'',
+``we must change the default back'',
+and ``let's figure out whether we can narrow down the bugs to
+few enough cases to allow the now-months-long-tested default
+to remain the same''.
+The latter viewpoint won that particular time.
+The bugs exposed other concerns regarding ABI compliance
+when the ABI specified treatment of complex data as different
+from treatment of what Fortran and GNU C consider the equivalent
+aggregation (structure) of real (or float) pairs.
+
  Other Fortran constructs---arrays, character strings,
  complex division, @code{COMMON} and @code{EQUIVALENCE} aggregates,
  and so on---involve issues similar to those pertaining to complex arithmetic.
@@ -1095,7 +1506,10 @@ be supported.
  Both this mythical, and today's real, GBE caters to its GBEL
  by, sometimes, scrambling around, cleaning up after itself---after
  discovering that assumptions it made earlier during code generation
-are incorrect.)
+are incorrect.
+That's not a great design, since it indicates significant code
+paths that might be rarely tested but used in some key production
+environments.)
  
  So, the FFE handles these discrepancies---between the order in which
  it discovers facts about the code it is compiling,
@@ -1143,7 +1557,7 @@ Further, after the @code{SYSTEM_CLOCK} library routine returns,
  the compiler must ensure that the temporary variable it wrote
  is copied into the appropriate element of the @samp{CLOCKS} array.
  (This assumes the compiler doesn't just reject the code,
-which it should if it is compiling under some kind of a "strict" option.)
+which it should if it is compiling under some kind of a ``strict'' option.)
  
  @item
  To determine the correct index into the @samp{CLOCKS} array,
@@ -1550,3 +1964,110 @@ to hold the value of the expression.
  @item
  Other stuff???
  @end itemize
+
+@node Internal Naming Conventions
+@section Internal Naming Conventions
+
+Names exported by FFE modules have the following (regular-expression) forms.
+Note that all names beginning @code{ffe@var{mod}} or @code{FFE@var{mod}},
+where @var{mod} is lowercase or uppercase alphanumerics, respectively,
+are exported by the module @code{ffe@var{mod}},
+with the source code doing the exporting in @file{@var{mod}.h}.
+(Usually, the source code for the implementation is in @file{@var{mod}.c}.)
+
+Identifiers that don't fit the following forms
+are not considered exported,
+even if they are according to the C language.
+(For example, they might be made available to other modules
+solely for use within expansions of exported macros,
+not for use within any source code in those other modules.)
+
+@table @code
+@item ffe@var{mod}
+The single typedef exported by the module.
+
+@item FFE@var{umod}_[A-Z][A-Z0-9_]*
+(Where @var{umod} is the uppercase for of @var{mod}.)
+
+A @code{#define} or @code{enum} constant of the type @code{ffe@var{mod}}.
+
+@item ffe@var{mod}[A-Z][A-Z][a-z0-9]*
+A typedef exported by the module.
+
+The portion of the identifier after @code{ffe@var{mod}} is
+referred to as @code{ctype}, a capitalized (mixed-case) form
+of @code{type}.
+
+@item FFE@var{umod}_@var{type}[A-Z][A-Z0-9_]*[A-Z0-9]?
+(Where @var{umod} is the uppercase for of @var{mod}.)
+
+A @code{#define} or @code{enum} constant of the type
+@code{ffe@var{mod}@var{type}},
+where @var{type} is the lowercase form of @var{ctype}
+in an exported typedef.
+
+@item ffe@var{mod}_@var{value}
+A function that does or returns something,
+as described by @var{value} (see below).
+
+@item ffe@var{mod}_@var{value}_@var{input}
+A function that does or returns something based
+primarily on the thing described by @var{input} (see below).
+@end table
+
+Below are names used for @var{value} and @var{input},
+along with their definitions.
+
+@table @code
+@item col
+A column number within a line (first column is number 1).
+
+@item file
+An encapsulation of a file's name.
+
+@item find
+Looks up an instance of some type that matches specified criteria,
+and returns that, even if it has to create a new instance or
+crash trying to find it (as appropriate).
+
+@item initialize
+Initializes, usually a module.  No type.
+
+@item int
+A generic integer of type @code{int}.
+
+@item is
+A generic integer that contains a true (non-zero) or false (zero) value.
+
+@item len
+A generic integer that contains the length of something.
+
+@item line
+A line number within a source file,
+or a global line number.
+
+@item lookup
+Looks up an instance of some type that matches specified criteria,
+and returns that, or returns nil.
+
+@item name
+A @code{text} that points to a name of something.
+
+@item new
+Makes a new instance of the indicated type.
+Might return an existing one if appropriate---if so,
+similar to @code{find} without crashing.
+
+@item pt
+Pointer to a particular character (line, column pairs)
+in the input file (source code being compiled).
+
+@item run
+Performs some herculean task.  No type.
+
+@item terminate
+Terminates, usually a module.  No type.
+
+@item text
+A @code{char *} that points to generic text.
+@end table