gcc/cp/gxxint.texi

   1 \input texinfo  @c -*-texinfo-*-
   2 @c %**start of header
   3 @setfilename g++int.info
   4 @settitle G++ internals
   5 @setchapternewpage odd
   6 @ifinfo
   7 @dircategory Programming
   8 @direntry
   9 * G++ internals: (g++int).     G++ Internals.
  10 @end direntry
  11 @end ifinfo
  12 @c %**end of header
  13
  14 @node Top, Limitations of g++, (dir), (dir)
  15 @chapter Internal Architecture of the Compiler
  16
  17 This is meant to describe the C++ front-end for gcc in detail.
  18 Questions and comments to Jason Merrill @email{jason@@redhat.com} and
  19 Mark Mitchell @email{mark@@codesourcery.com}.
  20
  21 @menu
  22 * Limitations of g++::
  23 * Routines::
  24 * Implementation Specifics::
  25 * Glossary::
  26 * Macros::
  27 * Typical Behavior::
  28 * Coding Conventions::
  29 * Templates::
  30 * Access Control::
  31 * Error Reporting::
  32 * Parser::
  33 * Exception Handling::
  34 * Free Store::
  35 * Mangling::  Function name mangling for C++ and Java
  36 * Concept Index::
  37 @end menu
  38
  39 @node Limitations of g++, Routines, Top, Top
  40 @section Limitations of g++
  41
  42 @itemize @bullet
  43 @item
  44 Limitations on input source code: 240 nesting levels with the parser
  45 stacksize (YYSTACKSIZE) set to 500 (the default), and requires around
  46 16.4k swap space per nesting level.  The parser needs about 2.09 *
  47 number of nesting levels worth of stackspace.
  48
  49 @cindex pushdecl_class_level
  50 @item
  51 I suspect there are other uses of pushdecl_class_level that do not call
  52 set_identifier_type_value in tandem with the call to
  53 pushdecl_class_level.  It would seem to be an omission.
  54
  55 @end itemize
  56
  57 @node Routines, Implementation Specifics, Limitations of g++, Top
  58 @section Routines
  59
  60 This section describes some of the routines used in the C++ front-end.
  61
  62 @code{build_vtable} and @code{prepare_fresh_vtable} is used only within
  63 the @file{cp-class.c} file, and only in @code{finish_struct} and
  64 @code{modify_vtable_entries}.
  65
  66 @code{build_vtable}, @code{prepare_fresh_vtable}, and
  67 @code{finish_struct} are the only routines that set @code{DECL_VPARENT}.
  68
  69 @code{finish_struct} can steal the virtual function table from parents,
  70 this prohibits related_vslot from working.  When finish_struct steals,
  71 we know that
  72
  73 @example
  74 get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
  75 @end example
  76
  77 @noindent
  78 will get the related binfo.
  79
  80 @code{layout_basetypes} does something with the VIRTUALS.
  81
  82 Supposedly (according to Tiemann) most of the breadth first searching
  83 done, like in @code{get_base_distance} and in @code{get_binfo} was not
  84 because of any design decision.  I have since found out the at least one
  85 part of the compiler needs the notion of depth first binfo searching, I
  86 am going to try and convert the whole thing, it should just work.  The
  87 term left-most refers to the depth first left-most node.  It uses
  88 @code{MAIN_VARIANT == type} as the condition to get left-most, because
  89 the things that have @code{BINFO_OFFSET}s of zero are shared and will
  90 have themselves as their own @code{MAIN_VARIANT}s.  The non-shared right
  91 ones, are copies of the left-most one, hence if it is its own
  92 @code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is
  93 a non-left-most one.
  94
  95 @code{get_base_distance}'s path and distance matters in its use in:
  96
  97 @itemize @bullet
  98 @item
  99 @code{prepare_fresh_vtable} (the code is probably wrong)
 100 @item
 101 @code{init_vfields} Depends upon distance probably in a safe way,
 102 build_offset_ref might use partial paths to do further lookups,
 103 hack_identifier is probably not properly checking access.
 104
 105 @item
 106 @code{get_first_matching_virtual} probably should check for
 107 @code{get_base_distance} returning -2.
 108
 109 @item
 110 @code{resolve_offset_ref} should be called in a more deterministic
 111 manner.  Right now, it is called in some random contexts, like for
 112 arguments at @code{build_method_call} time, @code{default_conversion}
 113 time, @code{convert_arguments} time, @code{build_unary_op} time,
 114 @code{build_c_cast} time, @code{build_modify_expr} time,
 115 @code{convert_for_assignment} time, and
 116 @code{convert_for_initialization} time.
 117
 118 But, there are still more contexts it needs to be called in, one was the
 119 ever simple:
 120
 121 @example
 122 if (obj.*pmi != 7)
 123    @dots{}
 124 @end example
 125
 126 Seems that the problems were due to the fact that @code{TREE_TYPE} of
 127 the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type
 128 of the referent (like @code{INTEGER_TYPE}).  This problem was fixed by
 129 changing @code{default_conversion} to check @code{TREE_CODE (x)},
 130 instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it
 131 was @code{OFFSET_TYPE}.
 132
 133 @end itemize
 134
 135 @node Implementation Specifics, Glossary, Routines, Top
 136 @section Implementation Specifics
 137
 138 @itemize @bullet
 139 @item Explicit Initialization
 140
 141 The global list @code{current_member_init_list} contains the list of
 142 mem-initializers specified in a constructor declaration.  For example:
 143
 144 @example
 145 foo::foo() : a(1), b(2) @{@}
 146 @end example
 147
 148 @noindent
 149 will initialize @samp{a} with 1 and @samp{b} with 2.
 150 @code{expand_member_init} places each initialization (a with 1) on the
 151 global list.  Then, when the fndecl is being processed,
 152 @code{emit_base_init} runs down the list, initializing them.  It used to
 153 be the case that g++ first ran down @code{current_member_init_list},
 154 then ran down the list of members initializing the ones that weren't
 155 explicitly initialized.  Things were rewritten to perform the
 156 initializations in order of declaration in the class.  So, for the above
 157 example, @samp{a} and @samp{b} will be initialized in the order that
 158 they were declared:
 159
 160 @example
 161 class foo @{ public: int b; int a; foo (); @};
 162 @end example
 163
 164 @noindent
 165 Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be
 166 initialized with 1, regardless of how they're listed in the mem-initializer.
 167
 168 @item The Explicit Keyword
 169
 170 The use of @code{explicit} on a constructor is used by @code{grokdeclarator}
 171 to set the field @code{DECL_NONCONVERTING_P}.  That value is used by
 172 @code{build_method_call} and @code{build_user_type_conversion_1} to decide
 173 if a particular constructor should be used as a candidate for conversions.
 174
 175 @end itemize
 176
 177 @node Glossary, Macros, Implementation Specifics, Top
 178 @section Glossary
 179
 180 @table @r
 181 @item binfo
 182 The main data structure in the compiler used to represent the
 183 inheritance relationships between classes.  The data in the binfo can be
 184 accessed by the BINFO_ accessor macros.
 185
 186 @item vtable
 187 @itemx virtual function table
 188
 189 The virtual function table holds information used in virtual function
 190 dispatching.  In the compiler, they are usually referred to as vtables,
 191 or vtbls.  The first index is not used in the normal way, I believe it
 192 is probably used for the virtual destructor.
 193
 194 @item vfield
 195
 196 vfields can be thought of as the base information needed to build
 197 vtables.  For every vtable that exists for a class, there is a vfield.
 198 See also vtable and virtual function table pointer.  When a type is used
 199 as a base class to another type, the virtual function table for the
 200 derived class can be based upon the vtable for the base class, just
 201 extended to include the additional virtual methods declared in the
 202 derived class.  The virtual function table from a virtual base class is
 203 never reused in a derived class.  @code{is_normal} depends upon this.
 204
 205 @item virtual function table pointer
 206
 207 These are @code{FIELD_DECL}s that are pointer types that point to
 208 vtables.  See also vtable and vfield.
 209 @end table
 210
 211 @node Macros, Typical Behavior, Glossary, Top
 212 @section Macros
 213
 214 This section describes some of the macros used on trees.  The list
 215 should be alphabetical.  Eventually all macros should be documented
 216 here.
 217
 218 @table @code
 219 @item BINFO_BASETYPES
 220 A vector of additional binfos for the types inherited by this basetype.
 221 The binfos are fully unshared (except for virtual bases, in which
 222 case the binfo structure is shared).
 223
 224    If this basetype describes type D as inherited in C,
 225    and if the basetypes of D are E anf F,
 226    then this vector contains binfos for inheritance of E and F by C.
 227
 228 Has values of:
 229
 230         TREE_VECs
 231
 232
 233 @item BINFO_INHERITANCE_CHAIN
 234 Temporarily used to represent specific inheritances.  It usually points
 235 to the binfo associated with the lesser derived type, but it can be
 236 reversed by reverse_path.  For example:
 237
 238 @example
 239         Z ZbY   least derived
 240         |
 241         Y YbX
 242         |
 243         X Xb    most derived
 244
 245 TYPE_BINFO (X) == Xb
 246 BINFO_INHERITANCE_CHAIN (Xb) == YbX
 247 BINFO_INHERITANCE_CHAIN (Yb) == ZbY
 248 BINFO_INHERITANCE_CHAIN (Zb) == 0
 249 @end example
 250
 251 Not sure is the above is really true, get_base_distance has is point
 252 towards the most derived type, opposite from above.
 253
 254 Set by build_vbase_path, recursive_bounded_basetype_p,
 255 get_base_distance, lookup_field, lookup_fnfields, and reverse_path.
 256
 257 What things can this be used on:
 258
 259         TREE_VECs that are binfos
 260
 261
 262 @item BINFO_OFFSET
 263 The offset where this basetype appears in its containing type.
 264 BINFO_OFFSET slot holds the offset (in bytes) from the base of the
 265 complete object to the base of the part of the object that is allocated
 266 on behalf of this `type'.  This is always 0 except when there is
 267 multiple inheritance.
 268
 269 Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example.
 270
 271
 272 @item BINFO_VIRTUALS
 273 A unique list of functions for the virtual function table.  See also
 274 TYPE_BINFO_VIRTUALS.
 275
 276 What things can this be used on:
 277
 278         TREE_VECs that are binfos
 279
 280
 281 @item BINFO_VTABLE
 282 Used to find the VAR_DECL that is the virtual function table associated
 283 with this binfo.  See also TYPE_BINFO_VTABLE.  To get the virtual
 284 function table pointer, see CLASSTYPE_VFIELD.
 285
 286 What things can this be used on:
 287
 288         TREE_VECs that are binfos
 289
 290 Has values of:
 291
 292         VAR_DECLs that are virtual function tables
 293
 294
 295 @item BLOCK_SUPERCONTEXT
 296 In the outermost scope of each function, it points to the FUNCTION_DECL
 297 node.  It aids in better DWARF support of inline functions.
 298
 299
 300 @item CLASSTYPE_TAGS
 301 CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a
 302 class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans
 303 these and calls pushtag on them.)
 304
 305 finish_struct scans these to produce TYPE_DECLs to add to the
 306 TYPE_FIELDS of the type.
 307
 308 It is expected that name found in the TREE_PURPOSE slot is unique,
 309 resolve_scope_to_name is one such place that depends upon this
 310 uniqueness.
 311
 312
 313 @item CLASSTYPE_METHOD_VEC
 314 The following is true after finish_struct has been called (on the
 315 class?) but not before.  Before finish_struct is called, things are
 316 different to some extent.  Contains a TREE_VEC of methods of the class.
 317 The TREE_VEC_LENGTH is the number of differently named methods plus one
 318 for the 0th entry.  The 0th entry is always allocated, and reserved for
 319 ctors and dtors.  If there are none, TREE_VEC_ELT(N,0) == NULL_TREE.
 320 Each entry of the TREE_VEC is a FUNCTION_DECL.  For each FUNCTION_DECL,
 321 there is a DECL_CHAIN slot.  If the FUNCTION_DECL is the last one with a
 322 given name, the DECL_CHAIN slot is NULL_TREE.  Otherwise it is the next
 323 method that has the same name (but a different signature).  It would
 324 seem that it is not true that because the DECL_CHAIN slot is used in
 325 this way, we cannot call pushdecl to put the method in the global scope
 326 (cause that would overwrite the TREE_CHAIN slot), because they use
 327 different _CHAINs.  finish_struct_methods setups up one version of the
 328 TREE_CHAIN slots on the FUNCTION_DECLs.
 329
 330 friends are kept in TREE_LISTs, so that there's no need to use their
 331 TREE_CHAIN slot for anything.
 332
 333 Has values of:
 334
 335         TREE_VECs
 336
 337
 338 @item CLASSTYPE_VFIELD
 339 Seems to be in the process of being renamed TYPE_VFIELD.  Use on types
 340 to get the main virtual function table pointer.  To get the virtual
 341 function table use BINFO_VTABLE (TYPE_BINFO ()).
 342
 343 Has values of:
 344
 345         FIELD_DECLs that are virtual function table pointers
 346
 347 What things can this be used on:
 348
 349         RECORD_TYPEs
 350
 351
 352 @item DECL_CLASS_CONTEXT
 353 Identifies the context that the _DECL was found in.  For virtual function
 354 tables, it points to the type associated with the virtual function
 355 table.  See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT.
 356
 357 The difference between this and DECL_CONTEXT, is that for virtuals
 358 functions like:
 359
 360 @example
 361 struct A
 362 @{
 363   virtual int f ();
 364 @};
 365
 366 struct B : A
 367 @{
 368   int f ();
 369 @};
 370
 371 DECL_CONTEXT (A::f) == A
 372 DECL_CLASS_CONTEXT (A::f) == A
 373
 374 DECL_CONTEXT (B::f) == A
 375 DECL_CLASS_CONTEXT (B::f) == B
 376 @end example
 377
 378 Has values of:
 379
 380         RECORD_TYPEs, or UNION_TYPEs
 381
 382 What things can this be used on:
 383
 384         TYPE_DECLs, _DECLs
 385
 386
 387 @item DECL_CONTEXT
 388 Identifies the context that the _DECL was found in.  Can be used on
 389 virtual function tables to find the type associated with the virtual
 390 function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a
 391 better access method.  Internally the same as DECL_FIELD_CONTEXT, so
 392 don't us both.  See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and
 393 DECL_CLASS_CONTEXT.
 394
 395 Has values of:
 396
 397         RECORD_TYPEs
 398
 399
 400 What things can this be used on:
 401
 402 @display
 403 VAR_DECLs that are virtual function tables
 404 _DECLs
 405 @end display
 406
 407
 408 @item DECL_FIELD_CONTEXT
 409 Identifies the context that the FIELD_DECL was found in.  Internally the
 410 same as DECL_CONTEXT, so don't us both.  See also DECL_CONTEXT,
 411 DECL_FCONTEXT and DECL_CLASS_CONTEXT.
 412
 413 Has values of:
 414
 415         RECORD_TYPEs
 416
 417 What things can this be used on:
 418
 419 @display
 420 FIELD_DECLs that are virtual function pointers
 421 FIELD_DECLs
 422 @end display
 423
 424
 425 @item DECL_NAME
 426
 427 Has values of:
 428
 429 @display
 430 0 for things that don't have names
 431 IDENTIFIER_NODEs for TYPE_DECLs
 432 @end display
 433
 434 @item DECL_IGNORED_P
 435 A bit that can be set to inform the debug information output routines in
 436 the back-end that a certain _DECL node should be totally ignored.
 437
 438 Used in cases where it is known that the debugging information will be
 439 output in another file, or where a sub-type is known not to be needed
 440 because the enclosing type is not needed.
 441
 442 A compiler constructed virtual destructor in derived classes that do not
 443 define an explicit destructor that was defined explicit in a base class
 444 has this bit set as well.  Also used on __FUNCTION__ and
 445 __PRETTY_FUNCTION__ to mark they are ``compiler generated.''  c-decl and
 446 c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,''
 447 and ``user-invisible variable.''
 448
 449 Functions built by the C++ front-end such as default destructors,
 450 virtual destructors and default constructors want to be marked that
 451 they are compiler generated, but unsure why.
 452
 453 Currently, it is used in an absolute way in the C++ front-end, as an
 454 optimization, to tell the debug information output routines to not
 455 generate debugging information that will be output by another separately
 456 compiled file.
 457
 458
 459 @item DECL_VIRTUAL_P
 460 A flag used on FIELD_DECLs and VAR_DECLs.  (Documentation in tree.h is
 461 wrong.)  Used in VAR_DECLs to indicate that the variable is a vtable.
 462 It is also used in FIELD_DECLs for vtable pointers.
 463
 464 What things can this be used on:
 465
 466         FIELD_DECLs and VAR_DECLs
 467
 468
 469 @item DECL_VPARENT
 470 Used to point to the parent type of the vtable if there is one, else it
 471 is just the type associated with the vtable.  Because of the sharing of
 472 virtual function tables that goes on, this slot is not very useful, and
 473 is in fact, not used in the compiler at all.  It can be removed.
 474
 475 What things can this be used on:
 476
 477         VAR_DECLs that are virtual function tables
 478
 479 Has values of:
 480
 481         RECORD_TYPEs maybe UNION_TYPEs
 482
 483
 484 @item DECL_FCONTEXT
 485 Used to find the first baseclass in which this FIELD_DECL is defined.
 486 See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT.
 487
 488 How it is used:
 489
 490         Used when writing out debugging information about vfield and
 491         vbase decls.
 492
 493 What things can this be used on:
 494
 495         FIELD_DECLs that are virtual function pointers
 496         FIELD_DECLs
 497
 498
 499 @item DECL_REFERENCE_SLOT
 500 Used to hold the initialize for the reference.
 501
 502 What things can this be used on:
 503
 504         PARM_DECLs and VAR_DECLs that have a reference type
 505
 506
 507 @item DECL_VINDEX
 508 Used for FUNCTION_DECLs in two different ways.  Before the structure
 509 containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a
 510 FUNCTION_DECL in a base class which is the FUNCTION_DECL which this
 511 FUNCTION_DECL will replace as a virtual function.  When the class is
 512 laid out, this pointer is changed to an INTEGER_CST node which is
 513 suitable to find an index into the virtual function table.  See
 514 get_vtable_entry as to how one can find the right index into the virtual
 515 function table.  The first index 0, of a virtual function table it not
 516 used in the normal way, so the first real index is 1.
 517
 518 DECL_VINDEX may be a TREE_LIST, that would seem to be a list of
 519 overridden FUNCTION_DECLs.  add_virtual_function has code to deal with
 520 this when it uses the variable base_fndecl_list, but it would seem that
 521 somehow, it is possible for the TREE_LIST to pursist until method_call,
 522 and it should not.
 523
 524
 525 What things can this be used on:
 526
 527         FUNCTION_DECLs
 528
 529
 530 @item DECL_SOURCE_FILE
 531 Identifies what source file a particular declaration was found in.
 532
 533 Has values of:
 534
 535         "<built-in>" on TYPE_DECLs to mean the typedef is built in
 536
 537
 538 @item DECL_SOURCE_LINE
 539 Identifies what source line number in the source file the declaration
 540 was found at.
 541
 542 Has values of:
 543
 544 @display
 545 0 for an undefined label
 546
 547 0 for TYPE_DECLs that are internally generated
 548
 549 0 for FUNCTION_DECLs for functions generated by the compiler
 550         (not yet, but should be)
 551
 552 0 for ``magic'' arguments to functions, that the user has no
 553         control over
 554 @end display
 555
 556
 557 @item TREE_USED
 558
 559 Has values of:
 560
 561         0 for unused labels
 562
 563
 564 @item TREE_ADDRESSABLE
 565 A flag that is set for any type that has a constructor.
 566
 567
 568 @item TREE_COMPLEXITY
 569 They seem a kludge way to track recursion, poping, and pushing.  They only
 570 appear in cp-decl.c and cp-decl2.c, so the are a good candidate for
 571 proper fixing, and removal.
 572
 573
 574 @item TREE_HAS_CONSTRUCTOR
 575 A flag to indicate when a CALL_EXPR represents a call to a constructor.
 576 If set, we know that the type of the object, is the complete type of the
 577 object, and that the value returned is nonnull.  When used in this
 578 fashion, it is an optimization.  Can also be used on SAVE_EXPRs to
 579 indicate when they are of fixed type and nonnull.  Can also be used on
 580 INDIRECT_EXPRs on CALL_EXPRs that represent a call to a constructor.
 581
 582
 583 @item TREE_PRIVATE
 584 Set for FIELD_DECLs by finish_struct.  But not uniformly set.
 585
 586 The following routines do something with PRIVATE access:
 587 build_method_call, alter_access, finish_struct_methods,
 588 finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType,
 589 CWriteUseObject, compute_access, lookup_field, dfs_pushdecl,
 590 GNU_xref_member, dbxout_type_fields, dbxout_type_method_1
 591
 592
 593 @item TREE_PROTECTED
 594 The following routines do something with PROTECTED access:
 595 build_method_call, alter_access, finish_struct, convert_to_aggr,
 596 CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject,
 597 compute_access, lookup_field, GNU_xref_member, dbxout_type_fields,
 598 dbxout_type_method_1
 599
 600
 601 @item TYPE_BINFO
 602 Used to get the binfo for the type.
 603
 604 Has values of:
 605
 606         TREE_VECs that are binfos
 607
 608 What things can this be used on:
 609
 610         RECORD_TYPEs
 611
 612
 613 @item TYPE_BINFO_BASETYPES
 614 See also BINFO_BASETYPES.
 615
 616 @item TYPE_BINFO_VIRTUALS
 617 A unique list of functions for the virtual function table.  See also
 618 BINFO_VIRTUALS.
 619
 620 What things can this be used on:
 621
 622         RECORD_TYPEs
 623
 624
 625 @item TYPE_BINFO_VTABLE
 626 Points to the virtual function table associated with the given type.
 627 See also BINFO_VTABLE.
 628
 629 What things can this be used on:
 630
 631         RECORD_TYPEs
 632
 633 Has values of:
 634
 635         VAR_DECLs that are virtual function tables
 636
 637
 638 @item TYPE_NAME
 639 Names the type.
 640
 641 Has values of:
 642
 643 @display
 644 0 for things that don't have names.
 645 should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and
 646         ENUM_TYPEs.
 647 TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but
 648         shouldn't be.
 649 TYPE_DECL for typedefs, unsure why.
 650 @end display
 651
 652 What things can one use this on:
 653
 654 @display
 655 TYPE_DECLs
 656 RECORD_TYPEs
 657 UNION_TYPEs
 658 ENUM_TYPEs
 659 @end display
 660
 661 History:
 662
 663         It currently points to the TYPE_DECL for RECORD_TYPEs,
 664         UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
 665
 666
 667 @item TYPE_METHODS
 668 Synonym for @code{CLASSTYPE_METHOD_VEC}.  Chained together with
 669 @code{TREE_CHAIN}.  @file{dbxout.c} uses this to get at the methods of a
 670 class.
 671
 672
 673 @item TYPE_DECL
 674 Used to represent typedefs, and used to represent bindings layers.
 675
 676 Components:
 677
 678         DECL_NAME is the name of the typedef.  For example, foo would
 679         be found in the DECL_NAME slot when @code{typedef int foo;} is
 680         seen.
 681
 682         DECL_SOURCE_LINE identifies what source line number in the
 683         source file the declaration was found at.  A value of 0
 684         indicates that this TYPE_DECL is just an internal binding layer
 685         marker, and does not correspond to a user supplied typedef.
 686
 687         DECL_SOURCE_FILE
 688
 689 @item TYPE_FIELDS
 690 A linked list (via @code{TREE_CHAIN}) of member types of a class.  The
 691 list can contain @code{TYPE_DECL}s, but there can also be other things
 692 in the list apparently.  See also @code{CLASSTYPE_TAGS}.
 693
 694
 695 @item TYPE_VIRTUAL_P
 696 A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is
 697 a virtual function table or a pointer to one.  When used on a
 698 @code{FUNCTION_DECL}, indicates that it is a virtual function.  When
 699 used on an @code{IDENTIFIER_NODE}, indicates that a function with this
 700 same name exists and has been declared virtual.
 701
 702 When used on types, it indicates that the type has virtual functions, or
 703 is derived from one that does.
 704
 705 Not sure if the above about virtual function tables is still true.  See
 706 also info on @code{DECL_VIRTUAL_P}.
 707
 708 What things can this be used on:
 709
 710         FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
 711
 712
 713 @item VF_BASETYPE_VALUE
 714 Get the associated type from the binfo that caused the given vfield to
 715 exist.  This is the least derived class (the most parent class) that
 716 needed a virtual function table.  It is probably the case that all uses
 717 of this field are misguided, but they need to be examined on a
 718 case-by-case basis.  See history for more information on why the
 719 previous statement was made.
 720
 721 Set at @code{finish_base_struct} time.
 722
 723 What things can this be used on:
 724
 725         TREE_LISTs that are vfields
 726
 727 History:
 728
 729         This field was used to determine if a virtual function table's
 730         slot should be filled in with a certain virtual function, by
 731         checking to see if the type returned by VF_BASETYPE_VALUE was a
 732         parent of the context in which the old virtual function existed.
 733         This incorrectly assumes that a given type _could_ not appear as
 734         a parent twice in a given inheritance lattice.  For single
 735         inheritance, this would in fact work, because a type could not
 736         possibly appear more than once in an inheritance lattice, but
 737         with multiple inheritance, a type can appear more than once.
 738
 739
 740 @item VF_BINFO_VALUE
 741 Identifies the binfo that caused this vfield to exist.  If this vfield
 742 is from the first direct base class that has a virtual function table,
 743 then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the
 744 direct base where the vfield came from.  Can use @code{TREE_VIA_VIRTUAL}
 745 on result to find out if it is a virtual base class.  Related to the
 746 binfo found by
 747
 748 @example
 749 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 750 @end example
 751
 752 @noindent
 753 where @samp{t} is the type that has the given vfield.
 754
 755 @example
 756 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
 757 @end example
 758
 759 @noindent
 760 will return the binfo for the given vfield.
 761
 762 May or may not be set at @code{modify_vtable_entries} time.  Set at
 763 @code{finish_base_struct} time.
 764
 765 What things can this be used on:
 766
 767         TREE_LISTs that are vfields
 768
 769
 770 @item VF_DERIVED_VALUE
 771 Identifies the type of the most derived class of the vfield, excluding
 772 the class this vfield is for.
 773
 774 Set at @code{finish_base_struct} time.
 775
 776 What things can this be used on:
 777
 778         TREE_LISTs that are vfields
 779
 780
 781 @item VF_NORMAL_VALUE
 782 Identifies the type of the most derived class of the vfield, including
 783 the class this vfield is for.
 784
 785 Set at @code{finish_base_struct} time.
 786
 787 What things can this be used on:
 788
 789         TREE_LISTs that are vfields
 790
 791
 792 @item WRITABLE_VTABLES
 793 This is a option that can be defined when building the compiler, that
 794 will cause the compiler to output vtables into the data segment so that
 795 the vtables maybe written.  This is undefined by default, because
 796 normally the vtables should be unwritable.  People that implement object
 797 I/O facilities may, or people that want to change the dynamic type of
 798 objects may want to have the vtables writable.  Another way of achieving
 799 this would be to make a copy of the vtable into writable memory, but the
 800 drawback there is that that method only changes the type for one object.
 801
 802 @end table
 803
 804 @node Typical Behavior, Coding Conventions, Macros, Top
 805 @section Typical Behavior
 806
 807 @cindex parse errors
 808
 809 Whenever seemingly normal code fails with errors like
 810 @code{syntax error at `\@{'}, it's highly likely that grokdeclarator is
 811 returning a NULL_TREE for whatever reason.
 812
 813 @node Coding Conventions, Templates, Typical Behavior, Top
 814 @section Coding Conventions
 815
 816 It should never be that case that trees are modified in-place by the
 817 back-end, @emph{unless} it is guaranteed that the semantics are the same
 818 no matter how shared the tree structure is.  @file{fold-const.c} still
 819 has some cases where this is not true, but rms hypothesizes that this
 820 will never be a problem.
 821
 822 @node Templates, Access Control, Coding Conventions, Top
 823 @section Templates
 824
 825 A template is represented by a @code{TEMPLATE_DECL}.  The specific
 826 fields used are:
 827
 828 @table @code
 829 @item DECL_TEMPLATE_RESULT
 830 The generic decl on which instantiations are based.  This looks just
 831 like any other decl.
 832
 833 @item DECL_TEMPLATE_PARMS
 834 The parameters to this template.
 835 @end table
 836
 837 The generic decl is parsed as much like any other decl as possible,
 838 given the parameterization.  The template decl is not built up until the
 839 generic decl has been completed.  For template classes, a template decl
 840 is generated for each member function and static data member, as well.
 841
 842 Template members of template classes are represented by a TEMPLATE_DECL
 843 for the class' parameters around another TEMPLATE_DECL for the member's
 844 parameters.
 845
 846 All declarations that are instantiations or specializations of templates
 847 refer to their template and parameters through DECL_TEMPLATE_INFO.
 848
 849 How should I handle parsing member functions with the proper param
 850 decls?  Set them up again or try to use the same ones?  Currently we do
 851 the former.  We can probably do this without any extra machinery in
 852 store_pending_inline, by deducing the parameters from the decl in
 853 do_pending_inlines.  PRE_PARSED_TEMPLATE_DECL?
 854
 855 If a base is a parm, we can't check anything about it.  If a base is not
 856 a parm, we need to check it for name binding.  Do finish_base_struct if
 857 no bases are parameterized (only if none, including indirect, are
 858 parms).  Nah, don't bother trying to do any of this until instantiation
 859 -- we only need to do name binding in advance.
 860
 861 Always set up method vec and fields, inc. synthesized methods.  Really?
 862 We can't know the types of the copy folks, or whether we need a
 863 destructor, or can have a default ctor, until we know our bases and
 864 fields.  Otherwise, we can assume and fix ourselves later.  Hopefully.
 865
 866 @node Access Control, Error Reporting, Templates, Top
 867 @section Access Control
 868 The function compute_access returns one of three values:
 869
 870 @table @code
 871 @item access_public
 872 means that the field can be accessed by the current lexical scope.
 873
 874 @item access_protected
 875 means that the field cannot be accessed by the current lexical scope
 876 because it is protected.
 877
 878 @item access_private
 879 means that the field cannot be accessed by the current lexical scope
 880 because it is private.
 881 @end table
 882
 883 DECL_ACCESS is used for access declarations; alter_access creates a list
 884 of types and accesses for a given decl.
 885
 886 Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return
 887 codes of compute_access and were used as a cache for compute_access.
 888 Now they are not used at all.
 889
 890 TREE_PROTECTED and TREE_PRIVATE are used to record the access levels
 891 granted by the containing class.  BEWARE: TREE_PUBLIC means something
 892 completely unrelated to access control!
 893
 894 @node Error Reporting, Parser, Access Control, Top
 895 @section Error Reporting
 896
 897 The C++ front-end uses a call-back mechanism to allow functions to print
 898 out reasonable strings for types and functions without putting extra
 899 logic in the functions where errors are found.  The interface is through
 900 the @code{cp_error} function (or @code{cp_warning}, etc.).  The
 901 syntax is exactly like that of @code{error}, except that a few more
 902 conversions are supported:
 903
 904 @itemize @bullet
 905 @item
 906 %C indicates a value of `enum tree_code'.
 907 @item
 908 %D indicates a *_DECL node.
 909 @item
 910 %E indicates a *_EXPR node.
 911 @item
 912 %L indicates a value of `enum languages'.
 913 @item
 914 %P indicates the name of a parameter (i.e. "this", "1", "2", ...)
 915 @item
 916 %T indicates a *_TYPE node.
 917 @item
 918 %O indicates the name of an operator (MODIFY_EXPR -> "operator =").
 919
 920 @end itemize
 921
 922 There is some overlap between these; for instance, any of the node
 923 options can be used for printing an identifier (though only @code{%D}
 924 tries to decipher function names).
 925
 926 For a more verbose message (@code{class foo} as opposed to just @code{foo},
 927 including the return type for functions), use @code{%#c}.
 928 To have the line number on the error message indicate the line of the
 929 DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want,
 930 use @code{%+D}, or it will default to the first.
 931
 932 @node Parser, Exception Handling, Error Reporting, Top
 933 @section Parser
 934
 935 Some comments on the parser:
 936
 937 The @code{after_type_declarator} / @code{notype_declarator} hack is
 938 necessary in order to allow redeclarations of @code{TYPENAME}s, for
 939 instance
 940
 941 @example
 942 typedef int foo;
 943 class A @{
 944   char *foo;
 945 @};
 946 @end example
 947
 948 In the above, the first @code{foo} is parsed as a @code{notype_declarator},
 949 and the second as a @code{after_type_declarator}.
 950
 951 Ambiguities:
 952
 953 There are currently four reduce/reduce ambiguities in the parser.  They are:
 954
 955 1) Between @code{template_parm} and
 956 @code{named_class_head_sans_basetype}, for the tokens @code{aggr
 957 identifier}.  This situation occurs in code looking like
 958
 959 @example
 960 template <class T> class A @{ @};
 961 @end example
 962
 963 It is ambiguous whether @code{class T} should be parsed as the
 964 declaration of a template type parameter named @code{T} or an unnamed
 965 constant parameter of type @code{class T}.  Section 14.6, paragraph 3 of
 966 the January '94 working paper states that the first interpretation is
 967 the correct one.  This ambiguity results in two reduce/reduce conflicts.
 968
 969 2) Between @code{primary} and @code{type_id} for code like @samp{int()}
 970 in places where both can be accepted, such as the argument to
 971 @code{sizeof}.  Section 8.1 of the pre-San Diego working paper specifies
 972 that these ambiguous constructs will be interpreted as @code{typename}s.
 973 This ambiguity results in six reduce/reduce conflicts between
 974 @samp{absdcl} and @samp{functional_cast}.
 975
 976 3) Between @code{functional_cast} and
 977 @code{complex_direct_notype_declarator}, for various token strings.
 978 This situation occurs in code looking like
 979
 980 @example
 981 int (*a);
 982 @end example
 983
 984 This code is ambiguous; it could be a declaration of the variable
 985 @samp{a} as a pointer to @samp{int}, or it could be a functional cast of
 986 @samp{*a} to @samp{int}.  Section 6.8 specifies that the former
 987 interpretation is correct.  This ambiguity results in 7 reduce/reduce
 988 conflicts.  Another aspect of this ambiguity is code like 'int (x[2]);',
 989 which is resolved at the '[' and accounts for 6 reduce/reduce conflicts
 990 between @samp{direct_notype_declarator} and
 991 @samp{primary}/@samp{overqualified_id}.  Finally, there are 4 r/r
 992 conflicts between @samp{expr_or_declarator} and @samp{primary} over code
 993 like 'int (a);', which could probably be resolved but would also
 994 probably be more trouble than it's worth.  In all, this situation
 995 accounts for 17 conflicts.  Ack!
 996
 997 The second case above is responsible for the failure to parse 'LinppFile
 998 ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave
 999 Math.h++) as an object declaration, and must be fixed so that it does
1000 not resolve until later.
1001
1002 4) Indirectly between @code{after_type_declarator} and @code{parm}, for
1003 type names.  This occurs in (as one example) code like
1004
1005 @example
1006 typedef int foo, bar;
1007 class A @{
1008   foo (bar);
1009 @};
1010 @end example
1011
1012 What is @code{bar} inside the class definition?  We currently interpret
1013 it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an
1014 @code{after_type_declarator}.  I believe that xlC is correct, in light
1015 of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that
1016 could possibly be a type name is taken as the @i{decl-specifier-seq} of
1017 a @i{declaration}."  However, it seems clear that this rule must be
1018 violated in the case of constructors.  This ambiguity accounts for 8
1019 conflicts.
1020
1021 Unlike the others, this ambiguity is not recognized by the Working Paper.
1022
1023 @node  Exception Handling, Free Store, Parser, Top
1024 @section Exception Handling
1025
1026 Note, exception handling in g++ is still under development.
1027
1028 This section describes the mapping of C++ exceptions in the C++
1029 front-end, into the back-end exception handling framework.
1030
1031 The basic mechanism of exception handling in the back-end is
1032 unwind-protect a la elisp.  This is a general, robust, and language
1033 independent representation for exceptions.
1034
1035 The C++ front-end exceptions are mapping into the unwind-protect
1036 semantics by the C++ front-end.  The mapping is describe below.
1037
1038 When -frtti is used, rtti is used to do exception object type checking,
1039 when it isn't used, the encoded name for the type of the object being
1040 thrown is used instead.  All code that originates exceptions, even code
1041 that throws exceptions as a side effect, like dynamic casting, and all
1042 code that catches exceptions must be compiled with either -frtti, or
1043 -fno-rtti.  It is not possible to mix rtti base exception handling
1044 objects with code that doesn't use rtti.  The exceptions to this, are
1045 code that doesn't catch or throw exceptions, catch (...), and code that
1046 just rethrows an exception.
1047
1048 Currently we use the normal mangling used in building functions names
1049 (int's are "i", const char * is PCc) to build the non-rtti base type
1050 descriptors for exception handling.  These descriptors are just plain
1051 NULL terminated strings, and internally they are passed around as char
1052 *.
1053
1054 In C++, all cleanups should be protected by exception regions.  The
1055 region starts just after the reason why the cleanup is created has
1056 ended.  For example, with an automatic variable, that has a constructor,
1057 it would be right after the constructor is run.  The region ends just
1058 before the finalization is expanded.  Since the backend may expand the
1059 cleanup multiple times along different paths, once for normal end of the
1060 region, once for non-local gotos, once for returns, etc, the backend
1061 must take special care to protect the finalization expansion, if the
1062 expansion is for any other reason than normal region end, and it is
1063 `inline' (it is inside the exception region).  The backend can either
1064 choose to move them out of line, or it can created an exception region
1065 over the finalization to protect it, and in the handler associated with
1066 it, it would not run the finalization as it otherwise would have, but
1067 rather just rethrow to the outer handler, careful to skip the normal
1068 handler for the original region.
1069
1070 In Ada, they will use the more runtime intensive approach of having
1071 fewer regions, but at the cost of additional work at run time, to keep a
1072 list of things that need cleanups.  When a variable has finished
1073 construction, they add the cleanup to the list, when the come to the end
1074 of the lifetime of the variable, the run the list down.  If the take a
1075 hit before the section finishes normally, they examine the list for
1076 actions to perform.  I hope they add this logic into the back-end, as it
1077 would be nice to get that alternative approach in C++.
1078
1079 On an rs6000, xlC stores exception objects on that stack, under the try
1080 block.  When is unwinds down into a handler, the frame pointer is
1081 adjusted back to the normal value for the frame in which the handler
1082 resides, and the stack pointer is left unchanged from the time at which
1083 the object was thrown.  This is so that there is always someplace for
1084 the exception object, and nothing can overwrite it, once we start
1085 throwing.  The only bad part, is that the stack remains large.
1086
1087 The below points out some things that work in g++'s exception handling.
1088
1089 All completely constructed temps and local variables are cleaned up in
1090 all unwinded scopes.  Completely constructed parts of partially
1091 constructed objects are cleaned up.  This includes partially built
1092 arrays.  Exception specifications are now handled.  Thrown objects are
1093 now cleaned up all the time.  We can now tell if we have an active
1094 exception being thrown or not (__eh_type != 0).  We use this to call
1095 terminate if someone does a throw; without there being an active
1096 exception object.  uncaught_exception () works.  Exception handling
1097 should work right if you optimize.  Exception handling should work with
1098 -fpic or -fPIC.
1099
1100 The below points out some flaws in g++'s exception handling, as it now
1101 stands.
1102
1103 Only exact type matching or reference matching of throw types works when
1104 -fno-rtti is used.  Only works on a SPARC (like Suns) (both -mflat and
1105 -mno-flat models work), SPARClite, Hitachi SH, i386, arm, rs6000,
1106 PowerPC, Alpha, mips, VAX, m68k and z8k machines.  SPARC v9 may not
1107 work.  HPPA is mostly done, but throwing between a shared library and
1108 user code doesn't yet work.  Some targets have support for data-driven
1109 unwinding.  Partial support is in for all other machines, but a stack
1110 unwinder called __unwind_function has to be written, and added to
1111 libgcc2 for them.  The new EH code doesn't rely upon the
1112 __unwind_function for C++ code, instead it creates per function
1113 unwinders right inside the function, unfortunately, on many platforms
1114 the definition of RETURN_ADDR_RTX in the tm.h file for the machine port
1115 is wrong.  See below for details on __unwind_function.  RTL_EXPRs for EH
1116 cond variables for && and || exprs should probably be wrapped in
1117 UNSAVE_EXPRs, and RTL_EXPRs tweaked so that they can be unsaved.
1118
1119 We only do pointer conversions on exception matching a la 15.3 p2 case
1120 3: `A handler with type T, const T, T&, or const T& is a match for a
1121 throw-expression with an object of type E if [3]T is a pointer type and
1122 E is a pointer type that can be converted to T by a standard pointer
1123 conversion (_conv.ptr_) not involving conversions to pointers to private
1124 or protected base classes.' when -frtti is given.
1125
1126 We don't call delete on new expressions that die because the ctor threw
1127 an exception.  See except/18 for a test case.
1128
1129 15.2 para 13: The exception being handled should be rethrown if control
1130 reaches the end of a handler of the function-try-block of a constructor
1131 or destructor, right now, it is not.
1132
1133 15.2 para 12: If a return statement appears in a handler of
1134 function-try-block of a constructor, the program is ill-formed, but this
1135 isn't diagnosed.
1136
1137 15.2 para 11: If the handlers of a function-try-block contain a jump
1138 into the body of a constructor or destructor, the program is ill-formed,
1139 but this isn't diagnosed.
1140
1141 15.2 para 9: Check that the fully constructed base classes and members
1142 of an object are destroyed before entering the handler of a
1143 function-try-block of a constructor or destructor for that object.
1144
1145 build_exception_variant should sort the incoming list, so that it
1146 implements set compares, not exact list equality.  Type smashing should
1147 smash exception specifications using set union.
1148
1149 Thrown objects are usually allocated on the heap, in the usual way.  If
1150 one runs out of heap space, throwing an object will probably never work.
1151 This could be relaxed some by passing an __in_chrg parameter to track
1152 who has control over the exception object.  Thrown objects are not
1153 allocated on the heap when they are pointer to object types.  We should
1154 extend it so that all small (<4*sizeof(void*)) objects are stored
1155 directly, instead of allocated on the heap.
1156
1157 When the backend returns a value, it can create new exception regions
1158 that need protecting.  The new region should rethrow the object in
1159 context of the last associated cleanup that ran to completion.
1160
1161 The structure of the code that is generated for C++ exception handling
1162 code is shown below:
1163
1164 @example
1165 Ln:                                     throw value;
1166         copy value onto heap
1167         jump throw (Ln, id, address of copy of value on heap)
1168
1169                                         try @{
1170 +Lstart:        the start of the main EH region
1171 |...                                            ...
1172 +Lend:          the end of the main EH region
1173                                         @} catch (T o) @{
1174                                                 ...1
1175                                         @}
1176 Lresume:
1177         nop     used to make sure there is something before
1178                 the next region ends, if there is one
1179 ...                                     ...
1180
1181         jump Ldone
1182 [
1183 Lmainhandler:    handler for the region Lstart-Lend
1184         cleanup
1185 ] zero or more, depending upon automatic vars with dtors
1186 +Lpartial:
1187 |        jump Lover
1188 +Lhere:
1189         rethrow (Lhere, same id, same obj);
1190 Lterm:          handler for the region Lpartial-Lhere
1191         call terminate
1192 Lover:
1193 [
1194  [
1195         call throw_type_match
1196         if (eq) @{
1197  ] these lines disappear when there is no catch condition
1198 +Lsregion2:
1199 |       ...1
1200 |       jump Lresume
1201 |Lhandler:      handler for the region Lsregion2-Leregion2
1202 |       rethrow (Lresume, same id, same obj);
1203 +Leregion2
1204         @}
1205 ] there are zero or more of these sections, depending upon how many
1206   catch clauses there are
1207 ----------------------------- expand_end_all_catch --------------------------
1208                 here we have fallen off the end of all catch
1209                 clauses, so we rethrow to outer
1210         rethrow (Lresume, same id, same obj);
1211 ----------------------------- expand_end_all_catch --------------------------
1212 [
1213 L1:     maybe throw routine
1214 ] depending upon if we have expanded it or not
1215 Ldone:
1216         ret
1217
1218 start_all_catch emits labels: Lresume,
1219
1220 @end example
1221
1222 The __unwind_function takes a pointer to the throw handler, and is
1223 expected to pop the stack frame that was built to call it, as well as
1224 the frame underneath and then jump to the throw handler.  It must
1225 restore all registers to their proper values as well as all other
1226 machine state as determined by the context in which we are unwinding
1227 into.  The way I normally start is to compile:
1228
1229         void *g;
1230         foo(void* a) @{ g = a; @}
1231
1232 with -S, and change the thing that alters the PC (return, or ret
1233 usually) to not alter the PC, making sure to leave all other semantics
1234 (like adjusting the stack pointer, or frame pointers) in.  After that,
1235 replicate the prologue once more at the end, again, changing the PC
1236 altering instructions, and finally, at the very end, jump to `g'.
1237
1238 It takes about a week to write this routine, if someone wants to
1239 volunteer to write this routine for any architecture, exception support
1240 for that architecture will be added to g++.  Please send in those code
1241 donations.  One other thing that needs to be done, is to double check
1242 that __builtin_return_address (0) works.
1243
1244 @subsection Specific Targets
1245
1246 For the alpha, the __unwind_function will be something resembling:
1247
1248 @example
1249 void
1250 __unwind_function(void *ptr)
1251 @{
1252   /* First frame */
1253   asm ("ldq $15, 8($30)"); /* get the saved frame ptr; 15 is fp, 30 is sp */
1254   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1255
1256   /* Second frame */
1257   asm ("ldq $15, 8($30)"); /* fp */
1258   asm ("bis $15, $15, $30"); /* reload sp with the fp we found */
1259
1260   /* Return */
1261   asm ("ret $31, ($16), 1"); /* return to PTR, stored in a0 */
1262 @}
1263 @end example
1264
1265 @noindent
1266 However, there are a few problems preventing it from working.  First of
1267 all, the gcc-internal function @code{__builtin_return_address} needs to
1268 work given an argument of 0 for the alpha.  As it stands as of August
1269 30th, 1995, the code for @code{BUILT_IN_RETURN_ADDRESS} in @file{expr.c}
1270 will definitely not work on the alpha.  Instead, we need to define
1271 the macros @code{DYNAMIC_CHAIN_ADDRESS} (maybe),
1272 @code{RETURN_ADDR_IN_PREVIOUS_FRAME}, and definitely need a new
1273 definition for @code{RETURN_ADDR_RTX}.
1274
1275 In addition (and more importantly), we need a way to reliably find the
1276 frame pointer on the alpha.  The use of the value 8 above to restore the
1277 frame pointer (register 15) is incorrect.  On many systems, the frame
1278 pointer is consistently offset to a specific point on the stack.  On the
1279 alpha, however, the frame pointer is pushed last.  First the return
1280 address is stored, then any other registers are saved (e.g., @code{s0}),
1281 and finally the frame pointer is put in place.  So @code{fp} could have
1282 an offset of 8, but if the calling function saved any registers at all,
1283 they add to the offset.
1284
1285 The only places the frame size is noted are with the @samp{.frame}
1286 directive, for use by the debugger and the OSF exception handling model
1287 (useless to us), and in the initial computation of the new value for
1288 @code{sp}, the stack pointer.  For example, the function may start with:
1289
1290 @example
1291 lda $30,-32($30)
1292 .frame $15,32,$26,0
1293 @end example
1294
1295 @noindent
1296 The 32 above is exactly the value we need.  With this, we can be sure
1297 that the frame pointer is stored 8 bytes less---in this case, at 24(sp)).
1298 The drawback is that there is no way that I (Brendan) have found to let
1299 us discover the size of a previous frame @emph{inside} the definition
1300 of @code{__unwind_function}.
1301
1302 So to accomplish exception handling support on the alpha, we need two
1303 things: first, a way to figure out where the frame pointer was stored,
1304 and second, a functional @code{__builtin_return_address} implementation
1305 for except.c to be able to use it.
1306
1307 Or just support DWARF 2 unwind info.
1308
1309 @subsection New Backend Exception Support
1310
1311 This subsection discusses various aspects of the design of the
1312 data-driven model being implemented for the exception handling backend.
1313
1314 The goal is to generate enough data during the compilation of user code,
1315 such that we can dynamically unwind through functions at run time with a
1316 single routine (@code{__throw}) that lives in libgcc.a, built by the
1317 compiler, and dispatch into associated exception handlers.
1318
1319 This information is generated by the DWARF 2 debugging backend, and
1320 includes all of the information __throw needs to unwind an arbitrary
1321 frame.  It specifies where all of the saved registers and the return
1322 address can be found at any point in the function.
1323
1324 Major disadvantages when enabling exceptions are:
1325
1326 @itemize @bullet
1327 @item
1328 Code that uses caller saved registers, can't, when flow can be
1329 transferred into that code from an exception handler.  In high performance
1330 code this should not usually be true, so the effects should be minimal.
1331
1332 @end itemize
1333
1334 @subsection Backend Exception Support
1335
1336 The backend must be extended to fully support exceptions.  Right now
1337 there are a few hooks into the alpha exception handling backend that
1338 resides in the C++ frontend from that backend that allows exception
1339 handling to work in g++.  An exception region is a segment of generated
1340 code that has a handler associated with it.  The exception regions are
1341 denoted in the generated code as address ranges denoted by a starting PC
1342 value and an ending PC value of the region.  Some of the limitations
1343 with this scheme are:
1344
1345 @itemize @bullet
1346 @item
1347 The backend replicates insns for such things as loop unrolling and
1348 function inlining.  Right now, there are no hooks into the frontend's
1349 exception handling backend to handle the replication of insns.  When
1350 replication happens, a new exception region descriptor needs to be
1351 generated for the new region.
1352
1353 @item
1354 The backend expects to be able to rearrange code, for things like jump
1355 optimization.  Any rearranging of the code needs have exception region
1356 descriptors updated appropriately.
1357
1358 @item
1359 The backend can eliminate dead code.  Any associated exception region
1360 descriptor that refers to fully contained code that has been eliminated
1361 should also be removed, although not doing this is harmless in terms of
1362 semantics.
1363
1364 @end itemize
1365
1366 The above is not meant to be exhaustive, but does include all things I
1367 have thought of so far.  I am sure other limitations exist.
1368
1369 Below are some notes on the migration of the exception handling code
1370 backend from the C++ frontend to the backend.
1371
1372 NOTEs are to be used to denote the start of an exception region, and the
1373 end of the region.  I presume that the interface used to generate these
1374 notes in the backend would be two functions, start_exception_region and
1375 end_exception_region (or something like that).  The frontends are
1376 required to call them in pairs.  When marking the end of a region, an
1377 argument can be passed to indicate the handler for the marked region.
1378 This can be passed in many ways, currently a tree is used.  Another
1379 possibility would be insns for the handler, or a label that denotes a
1380 handler.  I have a feeling insns might be the best way to pass it.
1381 Semantics are, if an exception is thrown inside the region, control is
1382 transferred unconditionally to the handler.  If control passes through
1383 the handler, then the backend is to rethrow the exception, in the
1384 context of the end of the original region.  The handler is protected by
1385 the conventional mechanisms; it is the frontend's responsibility to
1386 protect the handler, if special semantics are required.
1387
1388 This is a very low level view, and it would be nice is the backend
1389 supported a somewhat higher level view in addition to this view.  This
1390 higher level could include source line number, name of the source file,
1391 name of the language that threw the exception and possibly the name of
1392 the exception.  Kenner may want to rope you into doing more than just
1393 the basics required by C++.  You will have to resolve this.  He may want
1394 you to do support for non-local gotos, first scan for exception handler,
1395 if none is found, allow the debugger to be entered, without any cleanups
1396 being done.  To do this, the backend would have to know the difference
1397 between a cleanup-rethrower, and a real handler, if would also have to
1398 have a way to know if a handler `matches' a thrown exception, and this
1399 is frontend specific.
1400
1401 The stack unwinder is one of the hardest parts to do.  It is highly
1402 machine dependent.  The form that kenner seems to like was a couple of
1403 macros, that would do the machine dependent grunt work.  One preexisting
1404 function that might be of some use is __builtin_return_address ().  One
1405 macro he seemed to want was __builtin_return_address, and the other
1406 would do the hard work of fixing up the registers, adjusting the stack
1407 pointer, frame pointer, arg pointer and so on.
1408
1409
1410 @node Free Store, Mangling, Exception Handling, Top
1411 @section Free Store
1412
1413 @code{operator new []} adds a magic cookie to the beginning of arrays
1414 for which the number of elements will be needed by @code{operator delete
1415 []}.  These are arrays of objects with destructors and arrays of objects
1416 that define @code{operator delete []} with the optional size_t argument.
1417 This cookie can be examined from a program as follows:
1418
1419 @example
1420 typedef unsigned long size_t;
1421 extern "C" int printf (const char *, ...);
1422
1423 size_t nelts (void *p)
1424 @{
1425   struct cookie @{
1426     size_t nelts __attribute__ ((aligned (sizeof (double))));
1427   @};
1428
1429   cookie *cp = (cookie *)p;
1430   --cp;
1431
1432   return cp->nelts;
1433 @}
1434
1435 struct A @{
1436   ~A() @{ @}
1437 @};
1438
1439 main()
1440 @{
1441   A *ap = new A[3];
1442   printf ("%ld\n", nelts (ap));
1443 @}
1444 @end example
1445
1446 @section Linkage
1447 The linkage code in g++ is horribly twisted in order to meet two design goals:
1448
1449 1) Avoid unnecessary emission of inlines and vtables.
1450
1451 2) Support pedantic assemblers like the one in AIX.
1452
1453 To meet the first goal, we defer emission of inlines and vtables until
1454 the end of the translation unit, where we can decide whether or not they
1455 are needed, and how to emit them if they are.
1456
1457 @node Mangling, Concept Index, Free Store, Top
1458 @section Function name mangling for C++ and Java
1459
1460 Both C++ and Java provide overloaded functions and methods,
1461 which are methods with the same types but different parameter lists.
1462 Selecting the correct version is done at compile time.
1463 Though the overloaded functions have the same name in the source code,
1464 they need to be translated into different assembler-level names,
1465 since typical assemblers and linkers cannot handle overloading.
1466 This process of encoding the parameter types with the method name
1467 into a unique name is called @dfn{name mangling}.  The inverse
1468 process is called @dfn{demangling}.
1469
1470 It is convenient that C++ and Java use compatible mangling schemes,
1471 since the makes life easier for tools such as gdb, and it eases
1472 integration between C++ and Java.
1473
1474 Note there is also a standard "Jave Native Interface" (JNI) which
1475 implements a different calling convention, and uses a different
1476 mangling scheme.  The JNI is a rather abstract ABI so Java can call methods
1477 written in C or C++;
1478 we are concerned here about a lower-level interface primarily
1479 intended for methods written in Java, but that can also be used for C++
1480 (and less easily C).
1481
1482 Note that on systems that follow BSD tradition, a C identifier @code{var}
1483 would get "mangled" into the assembler name @samp{_var}.  On such
1484 systems, all other mangled names are also prefixed by a @samp{_}
1485 which is not shown in the following examples.
1486
1487 @subsection Method name mangling
1488
1489 C++ mangles a method by emitting the function name, followed by @code{__},
1490 followed by encodings of any method qualifiers (such as @code{const}),
1491 followed by the mangling of the method's class,
1492 followed by the mangling of the parameters, in order.
1493
1494 For example @code{Foo::bar(int, long) const} is mangled
1495 as @samp{bar__C3Fooil}.
1496
1497 For a constructor, the method name is left out.
1498 That is @code{Foo::Foo(int, long) const}  is mangled
1499 as @samp{__C3Fooil}.
1500
1501 GNU Java does the same.
1502
1503 @subsection Primitive types
1504
1505 The C++ types @code{int}, @code{long}, @code{short}, @code{char},
1506 and @code{long long} are mangled as @samp{i}, @samp{l},
1507 @samp{s}, @samp{c}, and @samp{x}, respectively.
1508 The corresponding unsigned types have @samp{U} prefixed
1509 to the mangling.  The type @code{signed char} is mangled @samp{Sc}.
1510
1511 The C++ and Java floating-point types @code{float} and @code{double}
1512 are mangled as @samp{f} and @samp{d} respectively.
1513
1514 The C++ @code{bool} type and the Java @code{boolean} type are
1515 mangled as @samp{b}.
1516
1517 The C++ @code{wchar_t} and the Java @code{char} types are
1518 mangled as @samp{w}.
1519
1520 The Java integral types @code{byte}, @code{short}, @code{int}
1521 and @code{long} are mangled as @samp{c}, @samp{s}, @samp{i},
1522 and @samp{x}, respectively.
1523
1524 C++ code that has included @code{javatypes.h} will mangle
1525 the typedefs  @code{jbyte}, @code{jshort}, @code{jint}
1526 and @code{jlong} as respectively @samp{c}, @samp{s}, @samp{i},
1527 and @samp{x}.  (This has not been implemented yet.)
1528
1529 @subsection Mangling of simple names
1530
1531 A simple class, package, template, or namespace name is
1532 encoded as the number of characters in the name, followed by
1533 the actual characters.  Thus the class @code{Foo}
1534 is encoded as @samp{3Foo}.
1535
1536 If any of the characters in the name are not alphanumeric
1537 (i.e not one of the standard ASCII letters, digits, or '_'),
1538 or the initial character is a digit, then the name is
1539 mangled as a sequence of encoded Unicode letters.
1540 A Unicode encoding starts with a @samp{U} to indicate
1541 that Unicode escapes are used, followed by the number of
1542 bytes used by the Unicode encoding, followed by the bytes
1543 representing the encoding.  ASSCI letters and
1544 non-initial digits are encoded without change.  However, all
1545 other characters (including underscore and initial digits) are
1546 translated into a sequence starting with an underscore,
1547 followed by the big-endian 4-hex-digit lower-case encoding of the character.
1548
1549 If a method name contains Unicode-escaped characters, the
1550 entire mangled method name is followed by a @samp{U}.
1551
1552 For example, the method @code{X\u0319::M\u002B(int)} is encoded as
1553 @samp{M_002b__U6X_0319iU}.
1554
1555
1556 @subsection Pointer and reference types
1557
1558 A C++ pointer type is mangled as @samp{P} followed by the
1559 mangling of the type pointed to.
1560
1561 A C++ reference type as mangled as @samp{R} followed by the
1562 mangling of the type referenced.
1563
1564 A Java object reference type is equivalent
1565 to a C++ pointer parameter, so we mangle such an parameter type
1566 as @samp{P} followed by the mangling of the class name.
1567
1568 @subsection Squangled type compression
1569
1570 Squangling (enabled with the @samp{-fsquangle} option), utilizes the
1571 @samp{B} code to indicate reuse of a previously seen type within an
1572 indentifier. Types are recognized in a left to right manner and given
1573 increasing values, which are appended to the code in the standard
1574 manner. Ie, multiple digit numbers are delimited by @samp{_}
1575 characters. A type is considered to be any non primitive type,
1576 regardless of whether its a parameter, template parameter, or entire
1577 template. Certain codes are considered modifiers of a type, and are not
1578 included as part of the type. These are the @samp{C}, @samp{V},
1579 @samp{P}, @samp{A}, @samp{R}, @samp{U} and @samp{u} codes, denoting
1580 constant, volatile, pointer, array, reference, unsigned, and restrict.
1581 These codes may precede a @samp{B} type in order to make the required
1582 modifications to the type.
1583
1584 For example:
1585 @example
1586 template <class T> class class1 @{ @};
1587
1588 template <class T> class class2 @{ @};
1589
1590 class class3 @{ @};
1591
1592 int f(class2<class1<class3> > a ,int b, const class1<class3>&c, class3 *d) @{ @}
1593
1594     B0 -> class2<class1<class3>
1595     B1 -> class1<class3>
1596     B2 -> class3
1597 @end example
1598 Produces the mangled name @samp{f__FGt6class21Zt6class11Z6class3iRCB1PB2}.
1599 The int parameter is a basic type, and does not receive a B encoding...
1600
1601 @subsection Qualified names
1602
1603 Both C++ and Java allow a class to be lexically nested inside another
1604 class.  C++ also supports namespaces.
1605 Java also supports packages.
1606
1607 These are all mangled the same way:  First the letter @samp{Q}
1608 indicates that we are emitting a qualified name.
1609 That is followed by the number of parts in the qualified name.
1610 If that number is 9 or less, it is emitted with no delimiters.
1611 Otherwise, an underscore is written before and after the count.
1612 Then follows each part of the qualified name, as described above.
1613
1614 For example @code{Foo::\u0319::Bar} is encoded as
1615 @samp{Q33FooU5_03193Bar}.
1616
1617 Squangling utilizes the the letter @samp{K} to indicate a
1618 remembered portion of a qualified name. As qualified names are processed
1619 for an identifier, the names are numbered and remembered in a
1620 manner similar to the @samp{B} type compression code.
1621 Names are recognized left to right, and given increasing values, which are
1622 appended to the code in the standard manner. ie, multiple digit numbers
1623 are delimited by @samp{_} characters.
1624
1625 For example
1626 @example
1627 class Andrew
1628 @{
1629   class WasHere
1630   @{
1631       class AndHereToo
1632       @{
1633       @};
1634   @};
1635 @};
1636
1637 f(Andrew&r1, Andrew::WasHere& r2, Andrew::WasHere::AndHereToo& r3) @{ @}
1638
1639    K0 ->  Andrew
1640    K1 ->  Andrew::WasHere
1641    K2 ->  Andrew::WasHere::AndHereToo
1642 @end example
1643 Function @samp{f()} would be mangled as :
1644 @samp{f__FR6AndrewRQ2K07WasHereRQ2K110AndHereToo}
1645
1646 There are some occasions when either a @samp{B} or @samp{K} code could
1647 be chosen, preference is always given to the @samp{B} code. Ie, the example
1648 in the section on @samp{B} mangling could have used a @samp{K} code
1649 instead of @samp{B2}.
1650
1651 @subsection Templates
1652
1653 A class template instantiation is encoded as the letter @samp{t},
1654 followed by the encoding of the template name, followed
1655 the number of template parameters, followed by encoding of the template
1656 parameters.  If a template parameter is a type, it is written
1657 as a @samp{Z} followed by the encoding of the type.  If it is a
1658 template, it is encoded as @samp{z} followed by the parameter
1659 of the template template parameter and the template name.
1660
1661 A function template specialization (either an instantiation or an
1662 explicit specialization) is encoded by an @samp{H} followed by the
1663 encoding of the template parameters, as described above, followed by an
1664 @samp{_}, the encoding of the argument types to the template function
1665 (not the specialization), another @samp{_}, and the return type.  (Like
1666 the argument types, the return type is the return type of the function
1667 template, not the specialization.)  Template parameters in the argument
1668 and return types are encoded by an @samp{X} for type parameters,
1669 @samp{zX} for template parameters,
1670 or a @samp{Y} for constant parameters, an index indicating their position
1671 in the template parameter list declaration, and their template depth.
1672
1673 @subsection Arrays
1674
1675 C++ array types are mangled by emitting @samp{A}, followed by
1676 the length of the array, followed by an @samp{_}, followed by
1677 the mangling of the element type.  Of course, normally
1678 array parameter types decay into a pointer types, so you
1679 don't see this.
1680
1681 Java arrays are objects.  A Java type @code{T[]} is mangled
1682 as if it were the C++ type @code{JArray<T>}.
1683 For example @code{java.lang.String[]} is encoded as
1684 @samp{Pt6JArray1ZPQ34java4lang6String}.
1685
1686 @subsection Static fields
1687
1688 Both C++ and Java classes can have static fields.
1689 These are allocated statically, and are shared among all instances.
1690
1691 The mangling starts with a prefix (@samp{_} in most systems), which is
1692 followed by the mangling
1693 of the class name, followed by the "joiner" and finally the field name.
1694 The joiner (see @code{JOINER} in @code{cp-tree.h}) is a special
1695 separator character.  For historical reasons (and idiosyncracies
1696 of assembler syntax) it can @samp{$} or @samp{.} (or even
1697 @samp{_} on a few systems).  If the joiner is @samp{_} then the prefix
1698 is @samp{__static_} instead of just @samp{_}.
1699
1700 For example @code{Foo::Bar::var} (or @code{Foo.Bar.var} in Java syntax)
1701 would be encoded as @samp{_Q23Foo3Bar$var} or @samp{_Q23Foo3Bar.var}
1702 (or rarely @samp{__static_Q23Foo3Bar_var}).
1703
1704 If the name of a static variable needs Unicode escapes,
1705 the Unicode indicator @samp{U} comes before the "joiner".
1706 This @code{\u1234Foo::var\u3445} becomes @code{_U8_1234FooU.var_3445}.
1707
1708 @subsection Table of demangling code characters
1709
1710 The following special characters are used in mangling:
1711
1712 @table @samp
1713 @item A
1714 Indicates a C++ array type.
1715
1716 @item b
1717 Encodes the C++ @code{bool} type,
1718 and the Java @code{boolean} type.
1719
1720 @item B
1721 Used for squangling. Similar in concept to the 'T' non-squangled code.
1722
1723 @item c
1724 Encodes the C++ @code{char} type, and the Java @code{byte} type.
1725
1726 @item C
1727 A modifier to indicate a @code{const} type.
1728 Also used to indicate a @code{const} member function
1729 (in which cases it precedes the encoding of the method's class).
1730
1731 @item d
1732 Encodes the C++ and Java @code{double} types.
1733
1734 @item e
1735 Indicates extra unknown arguments @code{...}.
1736
1737 @item E
1738 Indicates the opening parenthesis of an expression.
1739
1740 @item f
1741 Encodes the C++ and Java @code{float} types.
1742
1743 @item F
1744 Used to indicate a function type.
1745
1746 @item H
1747 Used to indicate a template function.
1748
1749 @item i
1750 Encodes the C++ and Java @code{int} types.
1751
1752 @item I
1753 Encodes typedef names of the form @code{int@var{n}_t}, where @var{n} is a
1754 positive decimal number.  The @samp{I} is followed by either two
1755 hexidecimal digits, which encode the value of @var{n}, or by an
1756 arbitrary number of hexidecimal digits between underscores.  For
1757 example, @samp{I40} encodes the type @code{int64_t}, and @samp{I_200_}
1758 encodes the type @code{int512_t}.
1759
1760 @item J
1761 Indicates a complex type.
1762
1763 @item K
1764 Used by squangling to compress qualified names.
1765
1766 @item l
1767 Encodes the C++ @code{long} type.
1768
1769 @item n
1770 Immediate repeated type. Followed by the repeat count.
1771
1772 @item N
1773 Repeated type. Followed by the repeat count of the repeated type,
1774 followed by the type index of the repeated type. Due to a bug in
1775 g++ 2.7.2, this is only generated if index is 0. Superceded by
1776 @samp{n} when squangling.
1777
1778 @item O
1779 Pointer-to-member type.
1780
1781 @item o
1782 vector type.
1783
1784 @item P
1785 Indicates a pointer type.  Followed by the type pointed to.
1786
1787 @item Q
1788 Used to mangle qualified names, which arise from nested classes.
1789 Also used for namespaces.
1790 In Java used to mangle package-qualified names, and inner classes.
1791
1792 @item r
1793 Encodes the GNU C++ @code{long double} type.
1794
1795 @item R
1796 Indicates a reference type.  Followed by the referenced type.
1797
1798 @item s
1799 Encodes the C++ and java @code{short} types.
1800
1801 @item S
1802 A modifier that indicates that the following integer type is signed.
1803 Only used with @code{char}.
1804
1805 Also used as a modifier to indicate a static member function.
1806
1807 @item t
1808 Indicates a template instantiation.
1809
1810 @item T
1811 A back reference to a previously seen type.
1812
1813 @item U
1814 A modifier that indicates that the following integer type is unsigned.
1815 Also used to indicate that the following class or namespace name
1816 is encoded using Unicode-mangling.
1817
1818 @item u
1819 The @code{restrict} type qualifier.
1820
1821 @item v
1822 Encodes the C++ and Java @code{void} types.
1823
1824 @item V
1825 A modifier for a @code{volatile} type or method.
1826
1827 @item w
1828 Encodes the C++ @code{wchar_t} type, and the Java @code{char} types.
1829
1830 @item W
1831 Indicates the closing parenthesis of an expression.
1832
1833 @item x
1834 Encodes the GNU C++ @code{long long} type, and the Java @code{long} type.
1835
1836 @item X
1837 Encodes a template type parameter, when part of a function type.
1838
1839 @item Y
1840 Encodes a template constant parameter, when part of a function type.
1841
1842 @item z
1843 Used for template template parameters.
1844
1845 @item Z
1846 Used for template type parameters.
1847
1848 @end table
1849
1850 The letters @samp{G}, @samp{M}, @samp{O}, and @samp{p}
1851 also seem to be used for obscure purposes ...
1852
1853 @node Concept Index,  , Mangling, Top
1854
1855 @section Concept Index
1856
1857 @printindex cp
1858
1859 @bye