gcc/cp/gxxint.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename g++int.info
   4 @settitle G++ internals
   5 @setchapternewpage odd
   6 @c %**end of header
   7
   8 @node Top, Limitations of g++, (dir), (dir)
   9 @chapter Internal Architecture of the Compiler
  10
  11 This is meant to describe the C++ front-end for gcc in detail.
  12 Questions and comments to Benjamin Kosnik @code{<bkoz@@cygnus.com>}.
  13
  14 @menu
  15 * Limitations of g++::
  16 * Routines::
  17 * Implementation Specifics::
  18 * Glossary::
  19 * Macros::
  20 * Typical Behavior::
  21 * Coding Conventions::
  22 * Templates::
  23 * Access Control::
  24 * Error Reporting::
  25 * Parser::
  26 * Copying Objects::
  27 * Exception Handling::
  28 * Free Store::
  29 * Mangling::  Function name mangling for C++ and Java
  30 * Concept Index::
  31 @end menu
  32
  33 @node Limitations of g++, Routines, Top, Top
  34 @section Limitations of g++
  35
  36 @itemize @bullet
  37 @item
  38 Limitations on input source code: 240 nesting levels with the parser
  39 stacksize (YYSTACKSIZE) set to 500 (the default), and requires around
  40 16.4k swap space per nesting level.  The parser needs about 2.09 *
  41 number of nesting levels worth of stackspace.
  42
  43 @cindex pushdecl_class_level
  44 @item
  45 I suspect there are other uses of pushdecl_class_level that do not call
  46 set_identifier_type_value in tandem with the call to
  47 pushdecl_class_level.  It would seem to be an omission.
  48
  49 @cindex access checking
  50 @item
  51 Access checking is unimplemented for nested types.
  52
  53 @cindex @code{volatile}
  54 @item
  55 @code{volatile} is not implemented in general.
  56
  57 @end itemize
  58
  59 @node Routines, Implementation Specifics, Limitations of g++, Top
  60 @section Routines
  61
  62 This section describes some of the routines used in the C++ front-end.
  63
  64 @code{build_vtable} and @code{prepare_fresh_vtable} is used only within
  65 the @file{cp-class.c} file, and only in @code{finish_struct} and
  66 @code{modify_vtable_entries}.
  67
  68 @code{build_vtable}, @code{prepare_fresh_vtable}, and
  69 @code{finish_struct} are the only routines that set @code{DECL_VPARENT}.
  70
  71 @code{finish_struct} can steal the virtual function table from parents,
  72 this prohibits related_vslot from working.  When finish_struct steals,
  73 we know that
  74
  75 @example
  76 get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
  77 @end example
  78
  79 @noindent
  80 will get the related binfo.
  81
  82 @code{layout_basetypes} does something with the VIRTUALS.
  83
  84 Supposedly (according to Tiemann) most of the breadth first searching
  85 done, like in @code{get_base_distance} and in @code{get_binfo} was not
  86 because of any design decision.  I have since found out the at least one
  87 part of the compiler needs the notion of depth first binfo searching, I
  88 am going to try and convert the whole thing, it should just work.  The
  89 term left-most refers to the depth first left-most node.  It uses
  90 @code{MAIN_VARIANT == type} as the condition to get left-most, because
  91 the things that have @code{BINFO_OFFSET}s of zero are shared and will
  92 have themselves as their own @code{MAIN_VARIANT}s.  The non-shared right
  93 ones, are copies of the left-most one, hence if it is its own
  94 @code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is
  95 a non-left-most one.
  96
  97 @code{get_base_distance}'s path and distance matters in its use in:
  98
  99 @itemize @bullet
 100 @item
 101 @code{prepare_fresh_vtable} (the code is probably wrong)
 102 @item
 103 @code{init_vfields} Depends upon distance probably in a safe way,
 104 build_offset_ref might use partial paths to do further lookups,
 105 hack_identifier is probably not properly checking access.
 106
 107 @item
 108 @code{get_first_matching_virtual} probably should check for
 109 @code{get_base_distance} returning -2.
 110
 111 @item
 112 @code{resolve_offset_ref} should be called in a more deterministic
 113 manner.  Right now, it is called in some random contexts, like for
 114 arguments at @code{build_method_call} time, @code{default_conversion}
 115 time, @code{convert_arguments} time, @code{build_unary_op} time,
 116 @code{build_c_cast} time, @code{build_modify_expr} time,
 117 @code{convert_for_assignment} time, and
 118 @code{convert_for_initialization} time.
 119
 120 But, there are still more contexts it needs to be called in, one was the
 121 ever simple:
 122
 123 @example
 124 if (obj.*pmi != 7)
 125    @dots{}
 126 @end example
 127
 128 Seems that the problems were due to the fact that @code{TREE_TYPE} of
 129 the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type
 130 of the referent (like @code{INTEGER_TYPE}).  This problem was fixed by
 131 changing @code{default_conversion} to check @code{TREE_CODE (x)},
 132 instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it
 133 was @code{OFFSET_TYPE}.
 134
 135 @end itemize
 136
 137 @node Implementation Specifics, Glossary, Routines, Top
 138 @section Implementation Specifics
 139
 140 @itemize @bullet
 141 @item Explicit Initialization
 142
 143 The global list @code{current_member_init_list} contains the list of
 144 mem-initializers specified in a constructor declaration.  For example:
 145
 146 @example
 147 foo::foo() : a(1), b(2) @{@}
 148 @end example
 149
 150 @noindent
 151 will initialize @samp{a} with 1 and @samp{b} with 2.
 152 @code{expand_member_init} places each initialization (a with 1) on the
 153 global list.  Then, when the fndecl is being processed,
 154 @code{emit_base_init} runs down the list, initializing them.  It used to
 155 be the case that g++ first ran down @code{current_member_init_list},
 156 then ran down the list of members initializing the ones that weren't
 157 explicitly initialized.  Things were rewritten to perform the
 158 initializations in order of declaration in the class.  So, for the above
 159 example, @samp{a} and @samp{b} will be initialized in the order that
 160 they were declared:
 161
 162 @example
 163 class foo @{ public: int b; int a; foo (); @};
 164 @end example
 165
 166 @noindent
 167 Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be
 168 initialized with 1, regardless of how they're listed in the mem-initializer.
 169
 170 @item The Explicit Keyword
 171
 172 The use of @code{explicit} on a constructor is used by @code{grokdeclarator}
 173 to set the field @code{DECL_NONCONVERTING_P}.  That value is used by
 174 @code{build_method_call} and @code{build_user_type_conversion_1} to decide
 175 if a particular constructor should be used as a candidate for conversions.
 176
 177 @end itemize
 178
 179 @node Glossary, Macros, Implementation Specifics, Top
 180 @section Glossary
 181
 182 @table @r
 183 @item binfo
 184 The main data structure in the compiler used to represent the
 185 inheritance relationships between classes.  The data in the binfo can be
 186 accessed by the BINFO_ accessor macros.
 187
 188 @item vtable
 189 @itemx virtual function table
 190
 191 The virtual function table holds information used in virtual function
 192 dispatching.  In the compiler, they are usually referred to as vtables,
 193 or vtbls.  The first index is not used in the normal way, I believe it
 194 is probably used for the virtual destructor.
 195
 196 @item vfield
 197
 198 vfields can be thought of as the base information needed to build
 199 vtables.  For every vtable that exists for a class, there is a vfield.
 200 See also vtable and virtual function table pointer.  When a type is used
 201 as a base class to another type, the virtual function table for the
 202 derived class can be based upon the vtable for the base class, just
 203 extended to include the additional virtual methods declared in the
 204 derived class.  The virtual function table from a virtual base class is
 205 never reused in a derived class.  @code{is_normal} depends upon this.
 206
 207 @item virtual function table pointer
 208
 209 These are @code{FIELD_DECL}s that are pointer types that point to
 210 vtables.  See also vtable and vfield.
 211 @end table
 212
 213 @node Macros, Typical Behavior, Glossary, Top
 214 @section Macros
 215
 216 This section describes some of the macros used on trees.  The list
 217 should be alphabetical.  Eventually all macros should be documented
 218 here.
 219
 220 @table @code
 221 @item BINFO_BASETYPES
 222 A vector of additional binfos for the types inherited by this basetype.
 223 The binfos are fully unshared (except for virtual bases, in which
 224 case the binfo structure is shared).
 225
 226    If this basetype describes type D as inherited in C,
 227    and if the basetypes of D are E anf F,
 228    then this vector contains binfos for inheritance of E and F by C.
 229
 230 Has values of:
 231
 232         TREE_VECs
 233
 234
 235 @item BINFO_INHERITANCE_CHAIN
 236 Temporarily used to represent specific inheritances.  It usually points
 237 to the binfo associated with the lesser derived type, but it can be
 238 reversed by reverse_path.  For example:
 239
 240 @example
 241         Z ZbY   least derived
 242         |
 243         Y YbX
 244         |
 245         X Xb    most derived
 246
 247 TYPE_BINFO (X) == Xb
 248 BINFO_INHERITANCE_CHAIN (Xb) == YbX
 249 BINFO_INHERITANCE_CHAIN (Yb) == ZbY
 250 BINFO_INHERITANCE_CHAIN (Zb) == 0
 251 @end example
 252
 253 Not sure is the above is really true, get_base_distance has is point
 254 towards the most derived type, opposite from above.
 255
 256 Set by build_vbase_path, recursive_bounded_basetype_p,
 257 get_base_distance, lookup_field, lookup_fnfields, and reverse_path.
 258
 259 What things can this be used on:
 260
 261         TREE_VECs that are binfos
 262
 263
 264 @item BINFO_OFFSET
 265 The offset where this basetype appears in its containing type.
 266 BINFO_OFFSET slot holds the offset (in bytes) from the base of the
 267 complete object to the base of the part of the object that is allocated
 268 on behalf of this `type'.  This is always 0 except when there is
 269 multiple inheritance.
 270
 271 Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example.
 272
 273
 274 @item BINFO_VIRTUALS
 275 A unique list of functions for the virtual function table.  See also
 276 TYPE_BINFO_VIRTUALS.
 277
 278 What things can this be used on:
 279
 280         TREE_VECs that are binfos
 281
 282
 283 @item BINFO_VTABLE
 284 Used to find the VAR_DECL that is the virtual function table associated
 285 with this binfo.  See also TYPE_BINFO_VTABLE.  To get the virtual
 286 function table pointer, see CLASSTYPE_VFIELD.
 287
 288 What things can this be used on:
 289
 290         TREE_VECs that are binfos
 291
 292 Has values of:
 293
 294         VAR_DECLs that are virtual function tables
 295
 296
 297 @item BLOCK_SUPERCONTEXT
 298 In the outermost scope of each function, it points to the FUNCTION_DECL
 299 node.  It aids in better DWARF support of inline functions.
 300
 301
 302 @item CLASSTYPE_TAGS
 303 CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a
 304 class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans
 305 these and calls pushtag on them.)
 306
 307 finish_struct scans these to produce TYPE_DECLs to add to the
 308 TYPE_FIELDS of the type.
 309
 310 It is expected that name found in the TREE_PURPOSE slot is unique,
 311 resolve_scope_to_name is one such place that depends upon this
 312 uniqueness.
 313
 314
 315 @item CLASSTYPE_METHOD_VEC
 316 The following is true after finish_struct has been called (on the
 317 class?) but not before.  Before finish_struct is called, things are
 318 different to some extent.  Contains a TREE_VEC of methods of the class.
 319 The TREE_VEC_LENGTH is the number of differently named methods plus one
 320 for the 0th entry.  The 0th entry is always allocated, and reserved for
 321 ctors and dtors.  If there are none, TREE_VEC_ELT(N,0) == NULL_TREE.
 322 Each entry of the TREE_VEC is a FUNCTION_DECL.  For each FUNCTION_DECL,
 323 there is a DECL_CHAIN slot.  If the FUNCTION_DECL is the last one with a
 324 given name, the DECL_CHAIN slot is NULL_TREE.  Otherwise it is the next
 325 method that has the same name (but a different signature).  It would
 326 seem that it is not true that because the DECL_CHAIN slot is used in
 327 this way, we cannot call pushdecl to put the method in the global scope
 328 (cause that would overwrite the TREE_CHAIN slot), because they use
 329 different _CHAINs.  finish_struct_methods setups up one version of the
 330 TREE_CHAIN slots on the FUNCTION_DECLs.
 331
 332 friends are kept in TREE_LISTs, so that there's no need to use their
 333 TREE_CHAIN slot for anything.
 334
 335 Has values of:
 336
 337         TREE_VECs
 338
 339
 340 @item CLASSTYPE_VFIELD
 341 Seems to be in the process of being renamed TYPE_VFIELD.  Use on types
 342 to get the main virtual function table pointer.  To get the virtual
 343 function table use BINFO_VTABLE (TYPE_BINFO ()).
 344
 345 Has values of:
 346
 347         FIELD_DECLs that are virtual function table pointers
 348
 349 What things can this be used on:
 350
 351         RECORD_TYPEs
 352
 353
 354 @item DECL_CLASS_CONTEXT
 355 Identifies the context that the _DECL was found in.  For virtual function
 356 tables, it points to the type associated with the virtual function
 357 table.  See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT.
 358
 359 The difference between this and DECL_CONTEXT, is that for virtuals
 360 functions like:
 361
 362 @example
 363 struct A
 364 @{
 365   virtual int f ();
 366 @};
 367
 368 struct B : A
 369 @{
 370   int f ();
 371 @};
 372
 373 DECL_CONTEXT (A::f) == A
 374 DECL_CLASS_CONTEXT (A::f) == A
 375
 376 DECL_CONTEXT (B::f) == A
 377 DECL_CLASS_CONTEXT (B::f) == B
 378 @end example
 379
 380 Has values of:
 381
 382         RECORD_TYPEs, or UNION_TYPEs
 383
 384 What things can this be used on:
 385
 386         TYPE_DECLs, _DECLs
 387
 388
 389 @item DECL_CONTEXT
 390 Identifies the context that the _DECL was found in.  Can be used on
 391 virtual function tables to find the type associated with the virtual
 392 function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a
 393 better access method.  Internally the same as DECL_FIELD_CONTEXT, so
 394 don't us both.  See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and
 395 DECL_CLASS_CONTEXT.
 396
 397 Has values of:
 398
 399         RECORD_TYPEs
 400
 401
 402 What things can this be used on:
 403
 404 @display
 405 VAR_DECLs that are virtual function tables
 406 _DECLs
 407 @end display
 408
 409
 410 @item DECL_FIELD_CONTEXT
 411 Identifies the context that the FIELD_DECL was found in.  Internally the
 412 same as DECL_CONTEXT, so don't us both.  See also DECL_CONTEXT,
 413 DECL_FCONTEXT and DECL_CLASS_CONTEXT.
 414
 415 Has values of:
 416
 417         RECORD_TYPEs
 418
 419 What things can this be used on:
 420
 421 @display
 422 FIELD_DECLs that are virtual function pointers
 423 FIELD_DECLs
 424 @end display
 425
 426
 427 @item DECL_NAME
 428
 429 Has values of:
 430
 431 @display
 432 0 for things that don't have names
 433 IDENTIFIER_NODEs for TYPE_DECLs
 434 @end display
 435
 436 @item DECL_IGNORED_P
 437 A bit that can be set to inform the debug information output routines in
 438 the back-end that a certain _DECL node should be totally ignored.
 439
 440 Used in cases where it is known that the debugging information will be
 441 output in another file, or where a sub-type is known not to be needed
 442 because the enclosing type is not needed.
 443
 444 A compiler constructed virtual destructor in derived classes that do not
 445 define an explicit destructor that was defined explicit in a base class
 446 has this bit set as well.  Also used on __FUNCTION__ and
 447 __PRETTY_FUNCTION__ to mark they are ``compiler generated.''  c-decl and
 448 c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,''
 449 and ``user-invisible variable.''
 450
 451 Functions built by the C++ front-end such as default destructors,
 452 virtual destructors and default constructors want to be marked that
 453 they are compiler generated, but unsure why.
 454
 455 Currently, it is used in an absolute way in the C++ front-end, as an
 456 optimization, to tell the debug information output routines to not
 457 generate debugging information that will be output by another separately
 458 compiled file.
 459
 460
 461 @item DECL_VIRTUAL_P
 462 A flag used on FIELD_DECLs and VAR_DECLs.  (Documentation in tree.h is
 463 wrong.)  Used in VAR_DECLs to indicate that the variable is a vtable.
 464 It is also used in FIELD_DECLs for vtable pointers.
 465
 466 What things can this be used on:
 467
 468         FIELD_DECLs and VAR_DECLs
 469
 470
 471 @item DECL_VPARENT
 472 Used to point to the parent type of the vtable if there is one, else it
 473 is just the type associated with the vtable.  Because of the sharing of
 474 virtual function tables that goes on, this slot is not very useful, and
 475 is in fact, not used in the compiler at all.  It can be removed.
 476
 477 What things can this be used on:
 478
 479         VAR_DECLs that are virtual function tables
 480
 481 Has values of:
 482
 483         RECORD_TYPEs maybe UNION_TYPEs
 484
 485
 486 @item DECL_FCONTEXT
 487 Used to find the first baseclass in which this FIELD_DECL is defined.
 488 See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT.
 489
 490 How it is used:
 491
 492         Used when writing out debugging information about vfield and
 493         vbase decls.
 494
 495 What things can this be used on:
 496
 497         FIELD_DECLs that are virtual function pointers
 498         FIELD_DECLs
 499
 500
 501 @item DECL_REFERENCE_SLOT
 502 Used to hold the initialize for the reference.
 503
 504 What things can this be used on:
 505
 506         PARM_DECLs and VAR_DECLs that have a reference type
 507
 508
 509 @item DECL_VINDEX
 510 Used for FUNCTION_DECLs in two different ways.  Before the structure
 511 containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a
 512 FUNCTION_DECL in a base class which is the FUNCTION_DECL which this
 513 FUNCTION_DECL will replace as a virtual function.  When the class is
 514 laid out, this pointer is changed to an INTEGER_CST node which is
 515 suitable to find an index into the virtual function table.  See
 516 get_vtable_entry as to how one can find the right index into the virtual
 517 function table.  The first index 0, of a virtual function table it not
 518 used in the normal way, so the first real index is 1.
 519
 520 DECL_VINDEX may be a TREE_LIST, that would seem to be a list of
 521 overridden FUNCTION_DECLs.  add_virtual_function has code to deal with
 522 this when it uses the variable base_fndecl_list, but it would seem that
 523 somehow, it is possible for the TREE_LIST to pursist until method_call,
 524 and it should not.
 525
 526
 527 What things can this be used on:
 528
 529         FUNCTION_DECLs
 530
 531
 532 @item DECL_SOURCE_FILE
 533 Identifies what source file a particular declaration was found in.
 534
 535 Has values of:
 536
 537         "<built-in>" on TYPE_DECLs to mean the typedef is built in
 538
 539
 540 @item DECL_SOURCE_LINE
 541 Identifies what source line number in the source file the declaration
 542 was found at.
 543
 544 Has values of:
 545
 546 @display
 547 0 for an undefined label
 548
 549 0 for TYPE_DECLs that are internally generated
 550
 551 0 for FUNCTION_DECLs for functions generated by the compiler
 552         (not yet, but should be)
 553
 554 0 for ``magic'' arguments to functions, that the user has no
 555         control over
 556 @end display
 557
 558
 559 @item TREE_USED
 560
 561 Has values of:
 562
 563         0 for unused labels
 564
 565
 566 @item TREE_ADDRESSABLE
 567 A flag that is set for any type that has a constructor.
 568
 569
 570 @item TREE_COMPLEXITY
 571 They seem a kludge way to track recursion, poping, and pushing.  They only
 572 appear in cp-decl.c and cp-decl2.c, so the are a good candidate for
 573 proper fixing, and removal.
 574
 575
 576 @item TREE_HAS_CONSTRUCTOR
 577 A flag to indicate when a CALL_EXPR represents a call to a constructor.
 578 If set, we know that the type of the object, is the complete type of the
 579 object, and that the value returned is nonnull.  When used in this
 580 fashion, it is an optimization.  Can also be used on SAVE_EXPRs to
 581 indicate when they are of fixed type and nonnull.  Can also be used on
 582 INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor.
 583
 584
 585 @item TREE_PRIVATE
 586 Set for FIELD_DECLs by finish_struct.  But not uniformly set.
 587
 588 The following routines do something with PRIVATE access:
 589 build_method_call, alter_access, finish_struct_methods,
 590 finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType,
 591 CWriteUseObject, compute_access, lookup_field, dfs_pushdecl,
 592 GNU_xref_member, dbxout_type_fields, dbxout_type_method_1
 593
 594
 595 @item TREE_PROTECTED
 596 The following routines do something with PROTECTED access:
 597 build_method_call, alter_access, finish_struct, convert_to_aggr,
 598 CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject,
 599 compute_access, lookup_field, GNU_xref_member, dbxout_type_fields,
 600 dbxout_type_method_1
 601
 602
 603 @item TYPE_BINFO
 604 Used to get the binfo for the type.
 605
 606 Has values of:
 607
 608         TREE_VECs that are binfos
 609
 610 What things can this be used on:
 611
 612         RECORD_TYPEs
 613
 614
 615 @item TYPE_BINFO_BASETYPES
 616 See also BINFO_BASETYPES.
 617
 618 @item TYPE_BINFO_VIRTUALS
 619 A unique list of functions for the virtual function table.  See also
 620 BINFO_VIRTUALS.
 621
 622 What things can this be used on:
 623
 624         RECORD_TYPEs
 625
 626
 627 @item TYPE_BINFO_VTABLE
 628 Points to the virtual function table associated with the given type.
 629 See also BINFO_VTABLE.
 630
 631 What things can this be used on:
 632
 633         RECORD_TYPEs
 634
 635 Has values of:
 636
 637         VAR_DECLs that are virtual function tables
 638
 639
 640 @item TYPE_NAME
 641 Names the type.
 642
 643 Has values of:
 644
 645 @display
 646 0 for things that don't have names.
 647 should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and
 648         ENUM_TYPEs.
 649 TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but
 650         shouldn't be.
 651 TYPE_DECL for typedefs, unsure why.
 652 @end display
 653
 654 What things can one use this on:
 655
 656 @display
 657 TYPE_DECLs
 658 RECORD_TYPEs
 659 UNION_TYPEs
 660 ENUM_TYPEs
 661 @end display
 662
 663 History:
 664
 665         It currently points to the TYPE_DECL for RECORD_TYPEs,
 666         UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
 667
 668
 669 @item TYPE_METHODS
 670 Synonym for @code{CLASSTYPE_METHOD_VEC}.  Chained together with
 671 @code{TREE_CHAIN}.  @file{dbxout.c} uses this to get at the methods of a
 672 class.
 673
 674
 675 @item TYPE_DECL
 676 Used to represent typedefs, and used to represent bindings layers.
 677
 678 Components:
 679
 680         DECL_NAME is the name of the typedef.  For example, foo would
 681         be found in the DECL_NAME slot when @code{typedef int foo;} is
 682         seen.
 683
 684         DECL_SOURCE_LINE identifies what source line number in the
 685         source file the declaration was found at.  A value of 0
 686         indicates that this TYPE_DECL is just an internal binding layer
 687         marker, and does not correspond to a user supplied typedef.
 688
 689         DECL_SOURCE_FILE
 690
 691 @item TYPE_FIELDS
 692 A linked list (via @code{TREE_CHAIN}) of member types of a class.  The
 693 list can contain @code{TYPE_DECL}s, but there can also be other things
 694 in the list apparently.  See also @code{CLASSTYPE_TAGS}.
 695
 696
 697 @item TYPE_VIRTUAL_P
 698 A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is
 699 a virtual function table or a pointer to one.  When used on a
 700 @code{FUNCTION_DECL}, indicates that it is a virtual function.  When
 701 used on an @code{IDENTIFIER_NODE}, indicates that a function with this
 702 same name exists and has been declared virtual.
 703
 704 When used on types, it indicates that the type has virtual functions, or
 705 is derived from one that does.
 706
 707 Not sure if the above about virtual function tables is still true.  See
 708 also info on @code{DECL_VIRTUAL_P}.
 709
 710 What things can this be used on:
 711
 712         FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
 713
 714
 715 @item VF_BASETYPE_VALUE
 716 Get the associated type from the binfo that caused the given vfield to
 717 exist.  This is the least derived class (the most parent class) that
 718 needed a virtual function table.  It is probably the case that all uses
 719 of this field are misguided, but they need to be examined on a
 720 case-by-case basis.  See history for more information on why the
 721 previous statement was made.
 722
 723 Set at @code{finish_base_struct} time.
 724
 725 What things can this be used on:
 726
 727         TREE_LISTs that are vfields
 728
 729 History:
 730
 731         This field was used to determine if a virtual function table's
 732         slot should be filled in with a certain virtual function, by
 733         checking to see if the type returned by VF_BASETYPE_VALUE was a
 734         parent of the context in which the old virtual function existed.
 735         This incorrectly assumes that a given type _could_ not appear as
 736         a parent twice in a given inheritance lattice.  For single
 737         inheritance, this would in fact work, because a type could not
 738         possibly appear more than once in an inheritance lattice, but
 739         with multiple inheritance, a type can appear more than once.
 740
 741
 742 @item VF_BINFO_VALUE
 743 Identifies the binfo that caused this vfield to exist.  If this vfield
 744 is from the first direct base class that has a virtual function table,
 745 then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the
 746 direct base where the vfield came from.  Can use @code{TREE_VIA_VIRTUAL}
 747 on result to find out if it is a virtual base class.  Related to the
 748 binfo found by
 749
 750 @example
 751 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 752 @end example
 753
 754 @noindent
 755 where @samp{t} is the type that has the given vfield.
 756
 757 @example
 758 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 759 @end example
 760
 761 @noindent
 762 will return the binfo for the given vfield.
 763
 764 May or may not be set at @code{modify_vtable_entries} time.  Set at
 765 @code{finish_base_struct} time.
 766
 767 What things can this be used on:
 768
 769         TREE_LISTs that are vfields
 770
 771
 772 @item VF_DERIVED_VALUE
 773 Identifies the type of the most derived class of the vfield, excluding
 774 the class this vfield is for.
 775
 776 Set at @code{finish_base_struct} time.
 777
 778 What things can this be used on:
 779
 780         TREE_LISTs that are vfields
 781
 782
 783 @item VF_NORMAL_VALUE
 784 Identifies the type of the most derived class of the vfield, including
 785 the class this vfield is for.
 786
 787 Set at @code{finish_base_struct} time.
 788
 789 What things can this be used on:
 790
 791         TREE_LISTs that are vfields
 792
 793
 794 @item WRITABLE_VTABLES
 795 This is a option that can be defined when building the compiler, that
 796 will cause the compiler to output vtables into the data segment so that
 797 the vtables maybe written.  This is undefined by default, because
 798 normally the vtables should be unwritable.  People that implement object
 799 I/O facilities may, or people that want to change the dynamic type of
 800 objects may want to have the vtables writable.  Another way of achieving
 801 this would be to make a copy of the vtable into writable memory, but the
 802 drawback there is that that method only changes the type for one object.
 803
 804 @end table
 805
 806 @node Typical Behavior, Coding Conventions, Macros, Top
 807 @section Typical Behavior
 808
 809 @cindex parse errors
 810
 811 Whenever seemingly normal code fails with errors like
 812 @code{syntax error at `\@{'}, it's highly likely that grokdeclarator is
 813 returning a NULL_TREE for whatever reason.
 814
 815 @node Coding Conventions, Templates, Typical Behavior, Top
 816 @section Coding Conventions
 817
 818 It should never be that case that trees are modified in-place by the
 819 back-end, @emph{unless} it is guaranteed that the semantics are the same
 820 no matter how shared the tree structure is.  @file{fold-const.c} still
 821 has some cases where this is not true, but rms hypothesizes that this
 822 will never be a problem.
 823
 824 @node Templates, Access Control, Coding Conventions, Top
 825 @section Templates
 826
 827 A template is represented by a @code{TEMPLATE_DECL}.  The specific
 828 fields used are:
 829
 830 @table @code
 831 @item DECL_TEMPLATE_RESULT
 832 The generic decl on which instantiations are based.  This looks just
 833 like any other decl.
 834
 835 @item DECL_TEMPLATE_PARMS
 836 The parameters to this template.
 837 @end table
 838
 839 The generic decl is parsed as much like any other decl as possible,
 840 given the parameterization.  The template decl is not built up until the
 841 generic decl has been completed.  For template classes, a template decl
 842 is generated for each member function and static data member, as well.
 843
 844 Template members of template classes are represented by a TEMPLATE_DECL
 845 for the class' parameters around another TEMPLATE_DECL for the member's
 846 parameters.
 847
 848 All declarations that are instantiations or specializations of templates
 849 refer to their template and parameters through DECL_TEMPLATE_INFO.
 850
 851 How should I handle parsing member functions with the proper param
 852 decls?  Set them up again or try to use the same ones?  Currently we do
 853 the former.  We can probably do this without any extra machinery in
 854 store_pending_inline, by deducing the parameters from the decl in
 855 do_pending_inlines.  PRE_PARSED_TEMPLATE_DECL?
 856
 857 If a base is a parm, we can't check anything about it.  If a base is not
 858 a parm, we need to check it for name binding.  Do finish_base_struct if
 859 no bases are parameterized (only if none, including indirect, are
 860 parms).  Nah, don't bother trying to do any of this until instantiation
 861 -- we only need to do name binding in advance.
 862
 863 Always set up method vec and fields, inc. synthesized methods.  Really?
 864 We can't know the types of the copy folks, or whether we need a
 865 destructor, or can have a default ctor, until we know our bases and
 866 fields.  Otherwise, we can assume and fix ourselves later.  Hopefully.
 867
 868 @node Access Control, Error Reporting, Templates, Top
 869 @section Access Control
 870 The function compute_access returns one of three values:
 871
 872 @table @code
 873 @item access_public
 874 means that the field can be accessed by the current lexical scope.
 875
 876 @item access_protected
 877 means that the field cannot be accessed by the current lexical scope
 878 because it is protected.
 879
 880 @item access_private
 881 means that the field cannot be accessed by the current lexical scope
 882 because it is private.
 883 @end table
 884
 885 DECL_ACCESS is used for access declarations; alter_access creates a list
 886 of types and accesses for a given decl.
 887
 888 Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return
 889 codes of compute_access and were used as a cache for compute_access.
 890 Now they are not used at all.
 891
 892 TREE_PROTECTED and TREE_PRIVATE are used to record the access levels
 893 granted by the containing class.  BEWARE: TREE_PUBLIC means something
 894 completely unrelated to access control!
 895
 896 @node Error Reporting, Parser, Access Control, Top
 897 @section Error Reporting
 898
 899 The C++ front-end uses a call-back mechanism to allow functions to print
 900 out reasonable strings for types and functions without putting extra
 901 logic in the functions where errors are found.  The interface is through
 902 the @code{cp_error} function (or @code{cp_warning}, etc.).  The
 903 syntax is exactly like that of @code{error}, except that a few more
 904 conversions are supported:
 905
 906 @itemize @bullet
 907 @item
 908 %C indicates a value of `enum tree_code'.
 909 @item
 910 %D indicates a *_DECL node.
 911 @item
 912 %E indicates a *_EXPR node.
 913 @item
 914 %L indicates a value of `enum languages'.
 915 @item
 916 %P indicates the name of a parameter (i.e. "this", "1", "2", ...)
 917 @item
 918 %T indicates a *_TYPE node.
 919 @item
 920 %O indicates the name of an operator (MODIFY_EXPR -> "operator =").
 921
 922 @end itemize
 923
 924 There is some overlap between these; for instance, any of the node
 925 options can be used for printing an identifier (though only @code{%D}
 926 tries to decipher function names).
 927
 928 For a more verbose message (@code{class foo} as opposed to just @code{foo},
 929 including the return type for functions), use @code{%#c}.
 930 To have the line number on the error message indicate the line of the
 931 DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want,
 932 use @code{%+D}, or it will default to the first.
 933
 934 @node Parser, Copying Objects, Error Reporting, Top
 935 @section Parser
 936
 937 Some comments on the parser:
 938
 939 The @code{after_type_declarator} / @code{notype_declarator} hack is
 940 necessary in order to allow redeclarations of @code{TYPENAME}s, for
 941 instance
 942
 943 @example
 944 typedef int foo;
 945 class A @{
 946   char *foo;
 947 @};
 948 @end example
 949
 950 In the above, the first @code{foo} is parsed as a @code{notype_declarator},
 951 and the second as a @code{after_type_declarator}.
 952
 953 Ambiguities:
 954
 955 There are currently four reduce/reduce ambiguities in the parser.  They are:
 956
 957 1) Between @code{template_parm} and
 958 @code{named_class_head_sans_basetype}, for the tokens @code{aggr
 959 identifier}.  This situation occurs in code looking like
 960
 961 @example
 962 template <class T> class A @{ @};
 963 @end example
 964
 965 It is ambiguous whether @code{class T} should be parsed as the
 966 declaration of a template type parameter named @code{T} or an unnamed
 967 constant parameter of type @code{class T}.  Section 14.6, paragraph 3 of
 968 the January '94 working paper states that the first interpretation is
 969 the correct one.  This ambiguity results in two reduce/reduce conflicts.
 970
 971 2) Between @code{primary} and @code{type_id} for code like @samp{int()}
 972 in places where both can be accepted, such as the argument to
 973 @code{sizeof}.  Section 8.1 of the pre-San Diego working paper specifies
 974 that these ambiguous constructs will be interpreted as @code{typename}s.
 975 This ambiguity results in six reduce/reduce conflicts between
 976 @samp{absdcl} and @samp{functional_cast}.
 977
 978 3) Between @code{functional_cast} and
 979 @code{complex_direct_notype_declarator}, for various token strings.
 980 This situation occurs in code looking like
 981
 982 @example
 983 int (*a);
 984 @end example
 985
 986 This code is ambiguous; it could be a declaration of the variable
 987 @samp{a} as a pointer to @samp{int}, or it could be a functional cast of
 988 @samp{*a} to @samp{int}.  Section 6.8 specifies that the former
 989 interpretation is correct.  This ambiguity results in 7 reduce/reduce
 990 conflicts.  Another aspect of this ambiguity is code like 'int (x[2]);',
 991 which is resolved at the '[' and accounts for 6 reduce/reduce conflicts
 992 between @samp{direct_notype_declarator} and
 993 @samp{primary}/@samp{overqualified_id}.  Finally, there are 4 r/r
 994 conflicts between @samp{expr_or_declarator} and @samp{primary} over code
 995 like 'int (a);', which could probably be resolved but would also
 996 probably be more trouble than it's worth.  In all, this situation
 997 accounts for 17 conflicts.  Ack!
 998
 999 The second case above is responsible for the failure to parse 'LinppFile
1000 ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave
1001 Math.h++) as an object declaration, and must be fixed so that it does
1002 not resolve until later.
1003
1004 4) Indirectly between @code{after_type_declarator} and @code{parm}, for
1005 type names.  This occurs in (as one example) code like
1006
1007 @example
1008 typedef int foo, bar;
1009 class A @{
1010   foo (bar);
1011 @};
1012 @end example
1013
1014 What is @code{bar} inside the class definition?  We currently interpret
1015 it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an
1016 @code{after_type_declarator}.  I believe that xlC is correct, in light
1017 of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that
1018 could possibly be a type name is taken as the @i{decl-specifier-seq} of
1019 a @i{declaration}."  However, it seems clear that this rule must be
1020 violated in the case of constructors.  This ambiguity accounts for 8
1021 conflicts.
1022
1023 Unlike the others, this ambiguity is not recognized by the Working Paper.
1024
1025 @node  Copying Objects, Exception Handling, Parser, Top
1026 @section Copying Objects
1027
1028 The generated copy assignment operator in g++ does not currently do the
1029 right thing for multiple inheritance involving virtual bases; it just
1030 calls the copy assignment operators for its direct bases.  What it
1031 should probably do is:
1032
1033 1) Split up the copy assignment operator for all classes that have
1034 vbases into "copy my vbases" and "copy everything else" parts.  Or do
1035 the trickiness that the constructors do to ensure that vbases don't get
1036 initialized by intermediate bases.
1037
1038 2) Wander through the class lattice, find all vbases for which no
1039 intermediate base has a user-defined copy assignment operator, and call
1040 their "copy everything else" routines.  If not all of my vbases satisfy
1041 this criterion, warn, because this may be surprising behavior.
1042
1043 3) Call the "copy everything else" routine for my direct bases.
1044
1045 If we only have one direct base, we can just foist everything off onto
1046 them.
1047
1048 This issue is currently under discussion in the core reflector
1049 (2/28/94).
1050
1051 @node  Exception Handling, Free Store, Copying Objects, Top
1052 @section Exception Handling
1053
1054 Note, exception handling in g++ is still under development.
1055
1056 This section describes the mapping of C++ exceptions in the C++
1057 front-end, into the back-end exception handling framework.
1058
1059 The basic mechanism of exception handling in the back-end is
1060 unwind-protect a la elisp.  This is a general, robust, and language
1061 independent representation for exceptions.
1062
1063 The C++ front-end exceptions are mapping into the unwind-protect
1064 semantics by the C++ front-end.  The mapping is describe below.
1065
1066 When -frtti is used, rtti is used to do exception object type checking,
1067 when it isn't used, the encoded name for the type of the object being
1068 thrown is used instead.  All code that originates exceptions, even code
1069 that throws exceptions as a side effect, like dynamic casting, and all
1070 code that catches exceptions must be compiled with either -frtti, or
1071 -fno-rtti.  It is not possible to mix rtti base exception handling
1072 objects with code that doesn't use rtti.  The exceptions to this, are
1073 code that doesn't catch or throw exceptions, catch (...), and code that
1074 just rethrows an exception.
1075
1076 Currently we use the normal mangling used in building functions names
1077 (int's are "i", const char * is PCc) to build the non-rtti base type
1078 descriptors for exception handling.  These descriptors are just plain
1079 NULL terminated strings, and internally they are passed around as char
1080 *.
1081
1082 In C++, all cleanups should be protected by exception regions.  The
1083 region starts just after the reason why the cleanup is created has
1084 ended.  For example, with an automatic variable, that has a constructor,
1085 it would be right after the constructor is run.  The region ends just
1086 before the finalization is expanded.  Since the backend may expand the
1087 cleanup multiple times along different paths, once for normal end of the
1088 region, once for non-local gotos, once for returns, etc, the backend
1089 must take special care to protect the finalization expansion, if the
1090 expansion is for any other reason than normal region end, and it is
1091 `inline' (it is inside the exception region).  The backend can either
1092 choose to move them out of line, or it can created an exception region
1093 over the finalization to protect it, and in the handler associated with
1094 it, it would not run the finalization as it otherwise would have, but
1095 rather just rethrow to the outer handler, careful to skip the normal
1096 handler for the original region.
1097
1098 In Ada, they will use the more runtime intensive approach of having
1099 fewer regions, but at the cost of additional work at run time, to keep a
1100 list of things that need cleanups.  When a variable has finished
1101 construction, they add the cleanup to the list, when the come to the end
1102 of the lifetime of the variable, the run the list down.  If the take a
1103 hit before the section finishes normally, they examine the list for
1104 actions to perform.  I hope they add this logic into the back-end, as it
1105 would be nice to get that alternative approach in C++.
1106
1107 On an rs6000, xlC stores exception objects on that stack, under the try
1108 block.  When is unwinds down into a handler, the frame pointer is
1109 adjusted back to the normal value for the frame in which the handler
1110 resides, and the stack pointer is left unchanged from the time at which
1111 the object was thrown.  This is so that there is always someplace for
1112 the exception object, and nothing can overwrite it, once we start
1113 throwing.  The only bad part, is that the stack remains large.
1114
1115 The below points out some things that work in g++'s exception handling.
1116
1117 All completely constructed temps and local variables are cleaned up in
1118 all unwinded scopes.  Completely constructed parts of partially
1119 constructed objects are cleaned up.  This includes partially built
1120 arrays.  Exception specifications are now handled.  Thrown objects are
1121 now cleaned up all the time.  We can now tell if we have an active
1122 exception being thrown or not (__eh_type != 0).  We use this to call
1123 terminate if someone does a throw; without there being an active
1124 exception object.  uncaught_exception () works.  Exception handling
1125 should work right if you optimize.  Exception handling should work with
1126 -fpic or -fPIC.
1127
1128 The below points out some flaws in g++'s exception handling, as it now
1129 stands.
1130
1131 Only exact type matching or reference matching of throw types works when
1132 -fno-rtti is used.  Only works on a SPARC (like Suns) (both -mflat and
1133 -mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000,
1134 PowerPC, Alpha, mips, VAX, m68k and z8k machines.  SPARC v9 may not
1135 work.  HPPA is mostly done, but throwing between a shared library and
1136 user code doesn't yet work.  Some targets have support for data-driven
1137 unwinding.  Partial support is in for all other machines, but a stack
1138 unwinder called __unwind_function has to be written, and added to
1139 libgcc2 for them.  The new EH code doesn't rely upon the
1140 __unwind_function for C++ code, instead it creates per function
1141 unwinders right inside the function, unfortunately, on many platforms
1142 the definition of RETURN_ADDR_RTX in the tm.h file for the machine port
1143 is wrong.  See below for details on __unwind_function.  RTL_EXPRs for EH
1144 cond variables for && and || exprs should probably be wrapped in
1145 UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved.
1146
1147 We only do pointer conversions on exception matching a la 15.3 p2 case
1148 3: `A handler with type T, const T, T&, or const T& is a match for a
1149 throw-expression with an object of type E if [3]T is a pointer type and
1150 E is a pointer type that can be converted to T by a standard pointer
1151 conversion (_conv.ptr_) not involving conversions to pointers to private
1152 or protected base classes.' when -frtti is given.
1153
1154 We don't call delete on new expressions that die because the ctor threw
1155 an exception.  See except/18 for a test case.
1156
1157 15.2 para 13: The exception being handled should be rethrown if control
1158 reaches the end of a handler of the function-try-block of a constructor
1159 or destructor, right now, it is not.
1160
1161 15.2 para 12: If a return statement appears in a handler of
1162 function-try-block of a constructor, the program is ill-formed, but this
1163 isn't diagnosed.
1164
1165 15.2 para 11: If the handlers of a function-try-block contain a jump
1166 into the body of a constructor or destructor, the program is ill-formed,
1167 but this isn't diagnosed.
1168
1169 15.2 para 9: Check that the fully constructed base classes and members
1170 of an object are destroyed before entering the handler of a
1171 function-try-block of a constructor or destructor for that object.
1172
1173 build_exception_variant should sort the incoming list, so that it
1174 implements set compares, not exact list equality.  Type smashing should
1175 smash exception specifications using set union.
1176
1177 Thrown objects are usually allocated on the heap, in the usual way.  If
1178 one runs out of heap space, throwing an object will probably never work.
1179 This could be relaxed some by passing an __in_chrg parameter to track
1180 who has control over the exception object.  Thrown objects are not
1181 allocated on the heap when they are pointer to object types.  We should
1182 extend it so that all small (<4*sizeof(void*)) objects are stored
1183 directly, instead of allocated on the heap.
1184
1185 When the backend returns a value, it can create new exception regions
1186 that need protecting.  The new region should rethrow the object in
1187 context of the last associated cleanup that ran to completion.
1188
1189 The structure of the code that is generated for C++ exception handling
1190 code is shown below:
1191
1192 @example
1193 Ln:                                     throw value;
1194         copy value onto heap
1195         jump throw (Ln, id, address of copy of value on heap)
1196
1197                                         try @{
1198 +Lstart:        the start of the main EH region
1199 |...                                            ...
1200 +Lend:          the end of the main EH region
1201                                         @} catch (T o) @{
1202                                                 ...1
1203                                         @}
1204 Lresume:
1205         nop     used to make sure there is something before
1206                 the next region ends, if there is one
1207 ...                                     ...
1208
1209         jump Ldone
1210 [
1211 Lmainhandler:    handler for the region Lstart-Lend
1212         cleanup
1213 ] zero or more, depending upon automatic vars with dtors
1214 +Lpartial:
1215 |        jump Lover
1216 +Lhere:
1217         rethrow (Lhere, same id, same obj);
1218 Lterm:          handler for the region Lpartial-Lhere
1219         call terminate
1220 Lover:
1221 [
1222  [
1223         call throw_type_match
1224         if (eq) @{
1225  ] these lines disappear when there is no catch condition
1226 +Lsregion2:
1227 |       ...1
1228 |       jump Lresume
1229 |Lhandler:      handler for the region Lsregion2-Leregion2
1230 |       rethrow (Lresume, same id, same obj);
1231 +Leregion2
1232         @}
1233 ] there are zero or more of these sections, depending upon how many
1234   catch clauses there are
1235 ----------------------------- expand_end_all_catch --------------------------
1236                 here we have fallen off the end of all catch
1237                 clauses, so we rethrow to outer
1238         rethrow (Lresume, same id, same obj);
1239 ----------------------------- expand_end_all_catch --------------------------
1240 [
1241 L1:     maybe throw routine
1242 ] depending upon if we have expanded it or not
1243 Ldone:
1244         ret
1245
1246 start_all_catch emits labels: Lresume,
1247
1248 @end example
1249
1250 The __unwind_function takes a pointer to the throw handler, and is
1251 expected to pop the stack frame that was built to call it, as well as
1252 the frame underneath and then jump to the throw handler.  It must
1253 restore all registers to their proper values as well as all other
1254 machine state as determined by the context in which we are unwinding
1255 into.  The way I normally start is to compile:
1256
1257         void *g;
1258         foo(void* a) @{ g = a; @}
1259
1260 with -S, and change the thing that alters the PC (return, or ret
1261 usually) to not alter the PC, making sure to leave all other semantics
1262 (like adjusting the stack pointer, or frame pointers) in.  After that,
1263 replicate the prologue once more at the end, again, changing the PC
1264 altering instructions, and finally, at the very end, jump to `g'.
1265
1266 It takes about a week to write this routine, if someone wants to
1267 volunteer to write this routine for any architecture, exception support
1268 for that architecture will be added to g++.  Please send in those code
1269 donations.  One other thing that needs to be done, is to double check
1270 that __builtin_return_address (0) works.
1271
1272 @subsection Specific Targets
1273
1274 For the alpha, the __unwind_function will be something resembling:
1275
1276 @example
1277 void
1278 __unwind_function(void *ptr)
1279 @{
1280   /* First frame */
1281   asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */
1282   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1283
1284   /* Second frame */
1285   asm ("ldq $15, 8($30)"); /* fp */
1286   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1287
1288   /* Return */
1289   asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */
1290 @}
1291 @end example
1292
1293 @noindent
1294 However, there are a few problems preventing it from working.  First of
1295 all, the gcc-internal function @code{__builtin_return_address} needs to
1296 work given an argument of 0 for the alpha.  As it stands as of August
1297 30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c}
1298 will definitely not work on the alpha.  Instead, we need to define
1299 the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe),
1300 @code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new
1301 definition for @code{RETURN_ADDR_RTX}.
1302
1303 In addition (and more importantly), we need a way to reliably find the
1304 frame pointer on the alpha.  The use of the value 8 above to restore the
1305 frame pointer (register 15) is incorrect.  On many systems, the frame
1306 pointer is consistently offset to a specific point on the stack.  On the
1307 alpha, however, the frame pointer is pushed last.  First the return
1308 address is stored, then any other registers are saved (e.g., @code{s0}),
1309 and finally the frame pointer is put in place.  So @code{fp} could have
1310 an offset of 8, but if the calling function saved any registers at all,
1311 they add to the offset.
1312
1313 The only places the frame size is noted are with the @samp{.frame}
1314 directive, for use by the debugger and the OSF exception handling model
1315 (useless to us), and in the initial computation of the new value for
1316 @code{sp}, the stack pointer.  For example, the function may start with:
1317
1318 @example
1319 lda $30,-32($30)
1320 .frame $15,32,$26,0
1321 @end example
1322
1323 @noindent
1324 The 32 above is exactly the value we need.  With this, we can be sure
1325 that the frame pointer is stored 8 bytes less---in this case, at 24(sp)).
1326 The drawback is that there is no way that I (Brendan) have found to let
1327 us discover the size of a previous frame @emph{inside} the definition
1328 of @code{__unwind_function}.
1329
1330 So to accomplish exception handling support on the alpha, we need two
1331 things: first, a way to figure out where the frame pointer was stored,
1332 and second, a functional @code{__builtin_return_address} implementation
1333 for except.c to be able to use it.
1334
1335 Or just support DWARF 2 unwind info.
1336
1337 @subsection New Backend Exception Support
1338
1339 This subsection discusses various aspects of the design of the
1340 data-driven model being implemented for the exception handling backend.
1341
1342 The goal is to generate enough data during the compilation of user code,
1343 such that we can dynamically unwind through functions at run time with a
1344 single routine (@code{__throw}) that lives in libgcc.a, built by the
1345 compiler, and dispatch into associated exception handlers.
1346
1347 This information is generated by the DWARF 2 debugging backend, and
1348 includes all of the information __throw needs to unwind an arbitrary
1349 frame.  It specifies where all of the saved registers and the return
1350 address can be found at any point in the function.
1351
1352 Major disadvantages when enabling exceptions are:
1353
1354 @itemize @bullet
1355 @item
1356 Code that uses caller saved registers, can't, when flow can be
1357 transferred into that code from an exception handler.  In high performance
1358 code this should not usually be true, so the effects should be minimal.
1359
1360 @end itemize
1361
1362 @subsection Backend Exception Support
1363
1364 The backend must be extended to fully support exceptions.  Right now
1365 there are a few hooks into the alpha exception handling backend that
1366 resides in the C++ frontend from that backend that allows exception
1367 handling to work in g++.  An exception region is a segment of generated
1368 code that has a handler associated with it.  The exception regions are
1369 denoted in the generated code as address ranges denoted by a starting PC
1370 value and an ending PC value of the region.  Some of the limitations
1371 with this scheme are:
1372
1373 @itemize @bullet
1374 @item
1375 The backend replicates insns for such things as loop unrolling and
1376 function inlining.  Right now, there are no hooks into the frontend's
1377 exception handling backend to handle the replication of insns.  When
1378 replication happens, a new exception region descriptor needs to be
1379 generated for the new region.
1380
1381 @item
1382 The backend expects to be able to rearrange code, for things like jump
1383 optimization.  Any rearranging of the code needs have exception region
1384 descriptors updated appropriately.
1385
1386 @item
1387 The backend can eliminate dead code.  Any associated exception region
1388 descriptor that refers to fully contained code that has been eliminated
1389 should also be removed, although not doing this is harmless in terms of
1390 semantics.
1391
1392 @end itemize
1393
1394 The above is not meant to be exhaustive, but does include all things I
1395 have thought of so far.  I am sure other limitations exist.
1396
1397 Below are some notes on the migration of the exception handling code
1398 backend from the C++ frontend to the backend.
1399
1400 NOTEs are to be used to denote the start of an exception region, and the
1401 end of the region.  I presume that the interface used to generate these
1402 notes in the backend would be two functions, start_exception_region and
1403 end_exception_region (or something like that).  The frontends are
1404 required to call them in pairs.  When marking the end of a region, an
1405 argument can be passed to indicate the handler for the marked region.
1406 This can be passed in many ways, currently a tree is used.  Another
1407 possibility would be insns for the handler, or a label that denotes a
1408 handler.  I have a feeling insns might be the best way to pass it.
1409 Semantics are, if an exception is thrown inside the region, control is
1410 transferred unconditionally to the handler.  If control passes through
1411 the handler, then the backend is to rethrow the exception, in the
1412 context of the end of the original region.  The handler is protected by
1413 the conventional mechanisms; it is the frontend's responsibility to
1414 protect the handler, if special semantics are required.
1415
1416 This is a very low level view, and it would be nice is the backend
1417 supported a somewhat higher level view in addition to this view.  This
1418 higher level could include source line number, name of the source file,
1419 name of the language that threw the exception and possibly the name of
1420 the exception.  Kenner may want to rope you into doing more than just
1421 the basics required by C++.  You will have to resolve this.  He may want
1422 you to do support for non-local gotos, first scan for exception handler,
1423 if none is found, allow the debugger to be entered, without any cleanups
1424 being done.  To do this, the backend would have to know the difference
1425 between a cleanup-rethrower, and a real handler, if would also have to
1426 have a way to know if a handler `matches' a thrown exception, and this
1427 is frontend specific.
1428
1429 The stack unwinder is one of the hardest parts to do.  It is highly
1430 machine dependent.  The form that kenner seems to like was a couple of
1431 macros, that would do the machine dependent grunt work.  One preexisting
1432 function that might be of some use is __builtin_return_address ().  One
1433 macro he seemed to want was __builtin_return_address, and the other
1434 would do the hard work of fixing up the registers, adjusting the stack
1435 pointer, frame pointer, arg pointer and so on.
1436
1437
1438 @node Free Store, Mangling, Exception Handling, Top
1439 @section Free Store
1440
1441 @code{operator new []} adds a magic cookie to the beginning of arrays
1442 for which the number of elements will be needed by @code{operator delete
1443 []}.  These are arrays of objects with destructors and arrays of objects
1444 that define @code{operator delete []} with the optional size_t argument.
1445 This cookie can be examined from a program as follows:
1446
1447 @example
1448 typedef unsigned long size_t;
1449 extern "C" int printf (const char *, ...);
1450
1451 size_t nelts (void *p)
1452 @{
1453   struct cookie @{
1454     size_t nelts __attribute__ ((aligned (sizeof (double))));
1455   @};
1456
1457   cookie *cp = (cookie *)p;
1458   --cp;
1459
1460   return cp->nelts;
1461 @}
1462
1463 struct A @{
1464   ~A() @{ @}
1465 @};
1466
1467 main()
1468 @{
1469   A *ap = new A[3];
1470   printf ("%ld\n", nelts (ap));
1471 @}
1472 @end example
1473
1474 @section Linkage
1475 The linkage code in g++ is horribly twisted in order to meet two design goals:
1476
1477 1) Avoid unnecessary emission of inlines and vtables.
1478
1479 2) Support pedantic assemblers like the one in AIX.
1480
1481 To meet the first goal, we defer emission of inlines and vtables until
1482 the end of the translation unit, where we can decide whether or not they
1483 are needed, and how to emit them if they are.
1484
1485 @node Mangling, Concept Index, Free Store, Top
1486 @section Function name mangling for C++ and Java
1487
1488 Both C++ and Jave provide overloaded function and methods,
1489 which are methods with the same types but different parameter lists.
1490 Selecting the correct version is done at compile time.
1491 Though the overloaded functions have the same name in the source code,
1492 they need to be translated into different assembler-level names,
1493 since typical assemblers and linkers cannot handle overloading.
1494 This process of encoding the parameter types with the method name
1495 into a unique name is called @dfn{name mangling}.  The inverse
1496 process is called @dfn{demangling}.
1497
1498 It is convenient that C++ and Java use compatible mangling schemes,
1499 since the makes life easier for tools such as gdb, and it eases
1500 integration between C++ and Java.
1501
1502 Note there is also a standard "Jave Native Interface" (JNI) which
1503 implements a different calling convention, and uses a different
1504 mangling scheme.  The JNI is a rather abstract ABI so Java can call methods
1505 written in C or C++;
1506 we are concerned here about a lower-level interface primarily
1507 intended for methods written in Java, but that can also be used for C++
1508 (and less easily C).
1509
1510 Note that on systems that follow BSD tradition, a C identifier @code{var}
1511 would get "mangled" into the assembler name @samp{_var}.  On such
1512 systems, all other mangled names are also prefixed by a @samp{_}
1513 which is not shown in the following examples.
1514
1515 @subsection Method name mangling
1516
1517 C++ mangles a method by emitting the function name, followed by @code{__},
1518 followed by encodings of any method qualifiers (such as @code{const}),
1519 followed by the mangling of the method's class,
1520 followed by the mangling of the parameters, in order.
1521
1522 For example @code{Foo::bar(int, long) const} is mangled
1523 as @samp{bar__C3Fooil}.
1524
1525 For a constructor, the method name is left out.
1526 That is @code{Foo::Foo(int, long) const}  is mangled
1527 as @samp{__C3Fooil}.
1528
1529 GNU Java does the same.
1530
1531 @subsection Primitive types
1532
1533 The C++ types @code{int}, @code{long}, @code{short}, @code{char},
1534 and @code{long long} are mangled as @samp{i}, @samp{l},
1535 @samp{s}, @samp{c}, and @samp{x}, respectively.
1536 The corresponding unsigned types have @samp{U} prefixed
1537 to the mangling.  The type @code{signed char} is mangled @samp{Sc}.
1538
1539 The C++ and Java floating-point types @code{float} and @code{double}
1540 are mangled as @samp{f} and @samp{d} respectively.
1541
1542 The C++ @code{bool} type and the Java @code{boolean} type are
1543 mangled as @samp{b}.
1544
1545 The C++ @code{wchar_t} and the Java @code{char} types are
1546 mangled as @samp{w}.
1547
1548 The Java integral types @code{byte}, @code{short}, @code{int}
1549 and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i},
1550 and @samp{x}, respectively.
1551
1552 C++ code that has included @code{javatypes.h} will mangle
1553 the typedefs  @code{jbyte}, @code{jshort}, @code{jint}
1554 and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i},
1555 and @samp{x}.  (This has not been implemented yet.)
1556
1557 @subsection Mangling of simple names
1558
1559 A simple class, package, template, or namespace name is
1560 encoded as the number of characters in the name, followed by
1561 the actual characters.  Thus the class @code{Foo}
1562 is encoded as @samp{3Foo}.
1563
1564 If any of the characters in the name are not alphanumeric
1565 (i.e not one of the standard ASCII letters, digits, or '_'),
1566 or the initial character is a digit, then the name is
1567 mangled as a sequence of encoded Unicode letters.
1568 A Unicode encoding starts with a @samp{U} to indicate
1569 that Unicode escapes are used, followed by the number of
1570 bytes used by the Unicode encoding, followed by the bytes
1571 representing the encoding.  ASSCI letters and
1572 non-initial digits are encoded without change.  However, all
1573 other characters (including underscore and initial digits) are
1574 translated into a sequence starting with an underscore,
1575 followed by the big-endian 4-hex-digit lower-case encoding of the character.
1576
1577 If a method name contains Unicode-escaped characters, the
1578 entire mangled method name is followed by a @samp{U}.
1579
1580 For example, the method @code{X\u0319::M\u002B(int)} is encoded as
1581 @samp{M_002b__U6X_0319iU}.
1582
1583
1584 @subsection Pointer and reference types
1585
1586 A C++ pointer type is mangled as @samp{P} followed by the
1587 mangling of the type pointed to.
1588
1589 A C++ reference type as mangled as @samp{R} followed by the
1590 mangling of the type referenced.
1591
1592 A Java object reference type is equivalent
1593 to a C++ pointer parameter, so we mangle such an parameter type
1594 as @samp{P} followed by the mangling of the class name.
1595
1596 @subsection Squangled type compression
1597
1598 Squangling (enabled with the @samp{-fsquangle} option), utilizes the
1599 @samp{B} code to indicate reuse of a previously seen type within an
1600 indentifier. Types are recognized in a left to right manner and given
1601 increasing values, which are appended to the code in the standard
1602 manner. Ie, multiple digit numbers are delimited by @samp{_}
1603 characters. A type is considered to be any non primitive type,
1604 regardless of whether its a parameter, template parameter, or entire
1605 template. Certain codes are considered modifiers of a type, and are not
1606 included as part of the type. These are the @samp{C}, @samp{V},
1607 @samp{P}, @samp{A}, @samp{R}, @samp{U} and @samp{u} codes, denoting
1608 constant, volatile, pointer, array, reference, unsigned, and restrict.
1609 These codes may precede a @samp{B} type in order to make the required
1610 modifications to the type.
1611
1612 For example:
1613 @example
1614 template <class T> class class1 @{ @};
1615
1616 template <class T> class class2 @{ @};
1617
1618 class class3 @{ @};
1619
1620 int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @}
1621
1622     B0 -> class2<class1<class3>
1623     B1 -> class1<class3>
1624     B2 -> class3
1625 @end example
1626 Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}.
1627 The int parameter is a basic type, and does not receive a B encoding...
1628
1629 @subsection Qualified names
1630
1631 Both C++ and Java allow a class to be lexically nested inside another
1632 class.  C++ also supports namespaces (not yet implemented by G++).
1633 Java also supports packages.
1634
1635 These are all mangled the same way:  First the letter @samp{Q}
1636 indicates that we are emitting a qualified name.
1637 That is followed by the number of parts in the qualified name.
1638 If that number is 9 or less, it is emitted with no delimiters.
1639 Otherwise, an underscore is written before and after the count.
1640 Then follows each part of the qualified name, as described above.
1641
1642 For example @code{Foo::\u0319::Bar} is encoded as
1643 @samp{Q33FooU5_03193Bar}.
1644
1645 Squangling utilizes the the letter @samp{K} to indicate a
1646 remembered portion of a qualified name. As qualified names are processed
1647 for an identifier, the names are numbered and remembered in a
1648 manner similar to the @samp{B} type compression code.
1649 Names are recognized left to right, and given increasing values, which are
1650 appended to the code in the standard manner. ie, multiple digit numbers
1651 are delimited by @samp{_} characters.
1652
1653 For example
1654 @example
1655 class Andrew
1656 @{
1657   class WasHere
1658   @{
1659       class AndHereToo
1660       @{
1661       @};
1662   @};
1663 @};
1664
1665 f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @}
1666
1667    K0 ->  Andrew
1668    K1 ->  Andrew::WasHere
1669    K2 ->  Andrew::WasHere::AndHereToo
1670 @end example
1671 Function @samp{f()} would be mangled as :
1672 @samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo}
1673
1674 There are some occasions when either a @samp{B} or @samp{K} code could
1675 be chosen, preference is always given to the @samp{B} code. Ie, the example
1676 in the section on @samp{B} mangling could have used a @samp{K} code
1677 instead of @samp{B2}.
1678
1679 @subsection Templates
1680
1681 A class template instantiation is encoded as the letter @samp{t},
1682 followed by the encoding of the template name, followed
1683 the number of template parameters, followed by encoding of the template
1684 parameters.  If a template parameter is a type, it is written
1685 as a @samp{Z} followed by the encoding of the type.
1686
1687 A function template specialization (either an instantiation or an
1688 explicit specialization) is encoded by an @samp{H} followed by the
1689 encoding of the template parameters, as described above, followed by an
1690 @samp{_}, the encoding of the argument types to the template function
1691 (not the specialization), another @samp{_}, and the return type.  (Like
1692 the argument types, the return type is the return type of the function
1693 template, not the specialization.)  Template parameters in the argument
1694 and return types are encoded by an @samp{X} for type parameters, or a
1695 @samp{Y} for constant parameters, an index indicating their position
1696 in the template parameter list declaration, and their template depth.
1697
1698 @subsection Arrays
1699
1700 C++ array types are mangled by emitting @samp{A}, followed by
1701 the length of the array, followed by an @samp{_}, followed by
1702 the mangling of the element type.  Of course, normally
1703 array parameter types decay into a pointer types, so you
1704 don't see this.
1705
1706 Java arrays are objects.  A Java type @code{T[]} is mangled
1707 as if it were the C++ type @code{JArray<T>}.
1708 For example @code{java.lang.String[]} is encoded as
1709 @samp{Pt6JArray1ZPQ34java4lang6String}.
1710
1711 @subsection Static fields
1712
1713 Both C++ and Java classes can have static fields.
1714 These are allocated statically, and are shared among all instances.
1715
1716 The mangling starts with a prefix (@samp{_} in most systems), which is
1717 followed by the mangling
1718 of the class name, followed by the "joiner" and finally the field name.
1719 The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special
1720 separator character.  For historical reasons (and idiosyncracies
1721 of assembler syntax) it can @samp{$} or @samp{.} (or even
1722 @samp{_} on a few systems).  If the joiner is @samp{_} then the prefix
1723 is @samp{__static_} instead of just @samp{_}.
1724
1725 For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax)
1726 would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var}
1727 (or rarely @samp{__static_Q23Foo3Bar_var}).
1728
1729 If the name of a static variable needs Unicode escapes,
1730 the Unicode indicator @samp{U} comes before the "joiner".
1731 This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}.
1732
1733 @subsection Table of demangling code characters
1734
1735 The following special characters are used in mangling:
1736
1737 @table @samp
1738 @item A
1739 Indicates a C++ array type.
1740
1741 @item b
1742 Encodes the C++ @code{bool} type,
1743 and the Java @code{boolean} type.
1744
1745 @item B
1746 Used for squangling. Similar in concept to the 'T' non-squangled code.
1747
1748 @item c
1749 Encodes the C++ @code{char} type, and the Java @code{byte} type.
1750
1751 @item C
1752 A modifier to indicate a @code{const} type.
1753 Also used to indicate a @code{const} member function
1754 (in which cases it precedes the encoding of the method's class).
1755
1756 @item d
1757 Encodes the C++ and Java @code{double} types.
1758
1759 @item e
1760 Indicates extra unknown arguments @code{...}.
1761
1762 @item E
1763 Indicates the opening parenthesis of an expression.
1764
1765 @item f
1766 Encodes the C++ and Java @code{float} types.
1767
1768 @item F
1769 Used to indicate a function type.
1770
1771 @item H
1772 Used to indicate a template function.
1773
1774 @item i
1775 Encodes the C++ and Java @code{int} types.
1776
1777 @item J
1778 Indicates a complex type.
1779
1780 @item K
1781 Used by squangling to compress qualified names.
1782
1783 @item l
1784 Encodes the C++ @code{long} type.
1785
1786 @item n
1787 Immediate repeated type. Followed by the repeat count.
1788
1789 @item N
1790 Repeated type. Followed by the repeat count of the repeated type,
1791 followed by the type index of the repeated type. Due to a bug in
1792 g++ 2.7.2, this is only generated if index is 0. Superceded by
1793 @samp{n} when squangling.
1794
1795 @item P
1796 Indicates a pointer type.  Followed by the type pointed to.
1797
1798 @item Q
1799 Used to mangle qualified names, which arise from nested classes.
1800 Also used for namespaces.
1801 In Java used to mangle package-qualified names, and inner classes.
1802
1803 @item r
1804 Encodes the GNU C++ @code{long double} type.
1805
1806 @item R
1807 Indicates a reference type.  Followed by the referenced type.
1808
1809 @item s
1810 Encodes the C++ and java @code{short} types.
1811
1812 @item S
1813 A modifier that indicates that the following integer type is signed.
1814 Only used with @code{char}.
1815
1816 Also used as a modifier to indicate a static member function.
1817
1818 @item t
1819 Indicates a template instantiation.
1820
1821 @item T
1822 A back reference to a previously seen type.
1823
1824 @item U
1825 A modifier that indicates that the following integer type is unsigned.
1826 Also used to indicate that the following class or namespace name
1827 is encoded using Unicode-mangling.
1828
1829 @item u
1830 The @code{restrict} type qualifier.
1831
1832 @item v
1833 Encodes the C++ and Java @code{void} types.
1834
1835 @item V
1836 A modifier for a @code{volatile} type or method.
1837
1838 @item w
1839 Encodes the C++ @code{wchar_t} type, and the Java @code{char} types.
1840
1841 @item W
1842 Indicates the closing parenthesis of an expression.
1843
1844 @item x
1845 Encodes the GNU C++ @code{long long} type, and the Java @code{long} type.
1846
1847 @item X
1848 Encodes a template type parameter, when part of a function type.
1849
1850 @item Y
1851 Encodes a template constant parameter, when part of a function type.
1852
1853 @item Z
1854 Used for template type parameters.
1855
1856 @end table
1857
1858 The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p}
1859 also seem to be used for obscure purposes ...
1860
1861 @node Concept Index,  , Mangling, Top
1862
1863 @section Concept Index
1864
1865 @printindex cp
1866
1867 @bye