1 \input texinfo @c -*-texinfo-*-
3 @setfilename g++int.info
4 @settitle G++ internals
8 @node Top, Limitations of g++, (dir), (dir)
9 @chapter Internal Architecture of the Compiler
11 This is meant to describe the C++ front-end for gcc in detail.
12 Questions and comments to mrs@@cygnus.com.
15 * Limitations of g++::
17 * Implementation Specifics::
21 * Coding Conventions::
27 * Exception Handling::
32 @node Limitations of g++, Routines, Top, Top
33 @section Limitations of g++
37 Limitations on input source code: 240 nesting levels with the parser
38 stacksize (YYSTACKSIZE) set to 500 (the default), and requires around
39 16.4k swap space per nesting level. The parser needs about 2.09 *
40 number of nesting levels worth of stackspace.
42 @cindex pushdecl_class_level
44 I suspect there are other uses of pushdecl_class_level that do not call
45 set_identifier_type_value in tandem with the call to
46 pushdecl_class_level. It would seem to be an omission.
48 @cindex access checking
50 Access checking is unimplemented for nested types.
52 @cindex @code{volatile}
54 @code{volatile} is not implemented in general.
56 @cindex pointers to members
58 Pointers to members are only minimally supported, and there are places
59 where the grammar doesn't even properly accept them yet.
61 @cindex multiple inheritance
63 @code{this} will be wrong in virtual members functions defined in a
64 virtual base class, when they are overridden in a derived class, when
65 called via a non-left most object.
70 extern "C" int printf(const char*, ...);
71 struct A @{ virtual void f() @{ @} @};
72 struct B : virtual A @{ int b; B() : b(0) @{@} void f() @{ b++; @} @};
81 printf ("C::b = %d, D::b = %d\n", e.C::b, e.D::b);
86 This will print out 2, 0, instead of 1,1.
90 @node Routines, Implementation Specifics, Limitations of g++, Top
93 This section describes some of the routines used in the C++ front-end.
95 @code{build_vtable} and @code{prepare_fresh_vtable} is used only within
96 the @file{cp-class.c} file, and only in @code{finish_struct} and
97 @code{modify_vtable_entries}.
99 @code{build_vtable}, @code{prepare_fresh_vtable}, and
100 @code{finish_struct} are the only routines that set @code{DECL_VPARENT}.
102 @code{finish_struct} can steal the virtual function table from parents,
103 this prohibits related_vslot from working. When finish_struct steals,
107 get_binfo (DECL_FIELD_CONTEXT (CLASSTYPE_VFIELD (t)), t, 0)
111 will get the related binfo.
113 @code{layout_basetypes} does something with the VIRTUALS.
115 Supposedly (according to Tiemann) most of the breadth first searching
116 done, like in @code{get_base_distance} and in @code{get_binfo} was not
117 because of any design decision. I have since found out the at least one
118 part of the compiler needs the notion of depth first binfo searching, I
119 am going to try and convert the whole thing, it should just work. The
120 term left-most refers to the depth first left-most node. It uses
121 @code{MAIN_VARIANT == type} as the condition to get left-most, because
122 the things that have @code{BINFO_OFFSET}s of zero are shared and will
123 have themselves as their own @code{MAIN_VARIANT}s. The non-shared right
124 ones, are copies of the left-most one, hence if it is its own
125 @code{MAIN_VARIANT}, we know it IS a left-most one, if it is not, it is
128 @code{get_base_distance}'s path and distance matters in its use in:
132 @code{prepare_fresh_vtable} (the code is probably wrong)
134 @code{init_vfields} Depends upon distance probably in a safe way,
135 build_offset_ref might use partial paths to do further lookups,
136 hack_identifier is probably not properly checking access.
139 @code{get_first_matching_virtual} probably should check for
140 @code{get_base_distance} returning -2.
143 @code{resolve_offset_ref} should be called in a more deterministic
144 manner. Right now, it is called in some random contexts, like for
145 arguments at @code{build_method_call} time, @code{default_conversion}
146 time, @code{convert_arguments} time, @code{build_unary_op} time,
147 @code{build_c_cast} time, @code{build_modify_expr} time,
148 @code{convert_for_assignment} time, and
149 @code{convert_for_initialization} time.
151 But, there are still more contexts it needs to be called in, one was the
159 Seems that the problems were due to the fact that @code{TREE_TYPE} of
160 the @code{OFFSET_REF} was not a @code{OFFSET_TYPE}, but rather the type
161 of the referent (like @code{INTEGER_TYPE}). This problem was fixed by
162 changing @code{default_conversion} to check @code{TREE_CODE (x)},
163 instead of only checking @code{TREE_CODE (TREE_TYPE (x))} to see if it
164 was @code{OFFSET_TYPE}.
168 @node Implementation Specifics, Glossary, Routines, Top
169 @section Implementation Specifics
172 @item Explicit Initialization
174 The global list @code{current_member_init_list} contains the list of
175 mem-initializers specified in a constructor declaration. For example:
178 foo::foo() : a(1), b(2) @{@}
182 will initialize @samp{a} with 1 and @samp{b} with 2.
183 @code{expand_member_init} places each initialization (a with 1) on the
184 global list. Then, when the fndecl is being processed,
185 @code{emit_base_init} runs down the list, initializing them. It used to
186 be the case that g++ first ran down @code{current_member_init_list},
187 then ran down the list of members initializing the ones that weren't
188 explicitly initialized. Things were rewritten to perform the
189 initializations in order of declaration in the class. So, for the above
190 example, @samp{a} and @samp{b} will be initialized in the order that
194 class foo @{ public: int b; int a; foo (); @};
198 Thus, @samp{b} will be initialized with 2 first, then @samp{a} will be
199 initialized with 1, regardless of how they're listed in the mem-initializer.
201 @item Argument Matching
203 In early 1993, the argument matching scheme in @sc{gnu} C++ changed
204 significantly. The original code was completely replaced with a new
205 method that will, hopefully, be easier to understand and make fixing
206 specific cases much easier.
208 The @samp{-fansi-overloading} option is used to enable the new code; at
209 some point in the future, it will become the default behavior of the
212 The file @file{cp-call.c} contains all of the new work, in the functions
213 @code{rank_for_overload}, @code{compute_harshness},
214 @code{compute_conversion_costs}, and @code{ideal_candidate}.
216 Instead of using obscure numerical values, the quality of an argument
217 match is now represented by clear, individual codes. The new data
218 structure @code{struct harshness} (it used to be an @code{unsigned}
222 @item the @samp{code} field, to signify what was involved in matching two
224 @item the @samp{distance} field, used in situations where inheritance
225 decides which function should be called (one is ``closer'' than
227 @item and the @samp{int_penalty} field, used by some codes as a tie-breaker.
230 The @samp{code} field is a number with a given bit set for each type of
231 code, OR'd together. The new codes are:
234 @item @code{EVIL_CODE}
235 The argument was not a permissible match.
237 @item @code{CONST_CODE}
238 Currently, this is only used by @code{compute_conversion_costs}, to
239 distinguish when a non-@code{const} member function is called from a
240 @code{const} member function.
242 @item @code{ELLIPSIS_CODE}
243 A match against an ellipsis @samp{...} is considered worse than all others.
245 @item @code{USER_CODE}
246 Used for a match involving a user-defined conversion.
248 @item @code{STD_CODE}
249 A match involving a standard conversion.
251 @item @code{PROMO_CODE}
252 A match involving an integral promotion. For these, the
253 @code{int_penalty} field is used to handle the ARM's rule (XXX cite)
254 that a smaller @code{unsigned} type should promote to a @code{int}, not
255 to an @code{unsigned int}.
257 @item @code{QUAL_CODE}
258 Used to mark use of qualifiers like @code{const} and @code{volatile}.
260 @item @code{TRIVIAL_CODE}
261 Used for trivial conversions. The @samp{int_penalty} field is used by
262 @code{convert_harshness} to communicate further penalty information back
263 to @code{build_overload_call_real} when deciding which function should
267 The functions @code{convert_to_aggr} and @code{build_method_call} use
268 @code{compute_conversion_costs} to rate each argument's suitability for
269 a given candidate function (that's how we get the list of candidates for
270 @code{ideal_candidate}).
274 @node Glossary, Macros, Implementation Specifics, Top
279 The main data structure in the compiler used to represent the
280 inheritance relationships between classes. The data in the binfo can be
281 accessed by the BINFO_ accessor macros.
284 @itemx virtual function table
286 The virtual function table holds information used in virtual function
287 dispatching. In the compiler, they are usually referred to as vtables,
288 or vtbls. The first index is not used in the normal way, I believe it
289 is probably used for the virtual destructor.
293 vfields can be thought of as the base information needed to build
294 vtables. For every vtable that exists for a class, there is a vfield.
295 See also vtable and virtual function table pointer. When a type is used
296 as a base class to another type, the virtual function table for the
297 derived class can be based upon the vtable for the base class, just
298 extended to include the additional virtual methods declared in the
299 derived class. The virtual function table from a virtual base class is
300 never reused in a derived class. @code{is_normal} depends upon this.
302 @item virtual function table pointer
304 These are @code{FIELD_DECL}s that are pointer types that point to
305 vtables. See also vtable and vfield.
308 @node Macros, Typical Behavior, Glossary, Top
311 This section describes some of the macros used on trees. The list
312 should be alphabetical. Eventually all macros should be documented
313 here. There are some postscript drawings that can be used to better
314 understand from of the more complex data structures, contact Mike Stump
315 (@code{mrs@@cygnus.com}) for information about them.
318 @item BINFO_BASETYPES
319 A vector of additional binfos for the types inherited by this basetype.
320 The binfos are fully unshared (except for virtual bases, in which
321 case the binfo structure is shared).
323 If this basetype describes type D as inherited in C,
324 and if the basetypes of D are E anf F,
325 then this vector contains binfos for inheritance of E and F by C.
332 @item BINFO_INHERITANCE_CHAIN
333 Temporarily used to represent specific inheritances. It usually points
334 to the binfo associated with the lesser derived type, but it can be
335 reversed by reverse_path. For example:
345 BINFO_INHERITANCE_CHAIN (Xb) == YbX
346 BINFO_INHERITANCE_CHAIN (Yb) == ZbY
347 BINFO_INHERITANCE_CHAIN (Zb) == 0
350 Not sure is the above is really true, get_base_distance has is point
351 towards the most derived type, opposite from above.
353 Set by build_vbase_path, recursive_bounded_basetype_p,
354 get_base_distance, lookup_field, lookup_fnfields, and reverse_path.
356 What things can this be used on:
358 TREE_VECs that are binfos
362 The offset where this basetype appears in its containing type.
363 BINFO_OFFSET slot holds the offset (in bytes) from the base of the
364 complete object to the base of the part of the object that is allocated
365 on behalf of this `type'. This is always 0 except when there is
366 multiple inheritance.
368 Used on TREE_VEC_ELTs of the binfos BINFO_BASETYPES (...) for example.
372 A unique list of functions for the virtual function table. See also
375 What things can this be used on:
377 TREE_VECs that are binfos
381 Used to find the VAR_DECL that is the virtual function table associated
382 with this binfo. See also TYPE_BINFO_VTABLE. To get the virtual
383 function table pointer, see CLASSTYPE_VFIELD.
385 What things can this be used on:
387 TREE_VECs that are binfos
391 VAR_DECLs that are virtual function tables
394 @item BLOCK_SUPERCONTEXT
395 In the outermost scope of each function, it points to the FUNCTION_DECL
396 node. It aids in better DWARF support of inline functions.
400 CLASSTYPE_TAGS is a linked (via TREE_CHAIN) list of member classes of a
401 class. TREE_PURPOSE is the name, TREE_VALUE is the type (pushclass scans
402 these and calls pushtag on them.)
404 finish_struct scans these to produce TYPE_DECLs to add to the
405 TYPE_FIELDS of the type.
407 It is expected that name found in the TREE_PURPOSE slot is unique,
408 resolve_scope_to_name is one such place that depends upon this
412 @item CLASSTYPE_METHOD_VEC
413 The following is true after finish_struct has been called (on the
414 class?) but not before. Before finish_struct is called, things are
415 different to some extent. Contains a TREE_VEC of methods of the class.
416 The TREE_VEC_LENGTH is the number of differently named methods plus one
417 for the 0th entry. The 0th entry is always allocated, and reserved for
418 ctors and dtors. If there are none, TREE_VEC_ELT(N,0) == NULL_TREE.
419 Each entry of the TREE_VEC is a FUNCTION_DECL. For each FUNCTION_DECL,
420 there is a DECL_CHAIN slot. If the FUNCTION_DECL is the last one with a
421 given name, the DECL_CHAIN slot is NULL_TREE. Otherwise it is the next
422 method that has the same name (but a different signature). It would
423 seem that it is not true that because the DECL_CHAIN slot is used in
424 this way, we cannot call pushdecl to put the method in the global scope
425 (cause that would overwrite the TREE_CHAIN slot), because they use
426 different _CHAINs. finish_struct_methods setups up one version of the
427 TREE_CHAIN slots on the FUNCTION_DECLs.
429 friends are kept in TREE_LISTs, so that there's no need to use their
430 TREE_CHAIN slot for anything.
437 @item CLASSTYPE_VFIELD
438 Seems to be in the process of being renamed TYPE_VFIELD. Use on types
439 to get the main virtual function table pointer. To get the virtual
440 function table use BINFO_VTABLE (TYPE_BINFO ()).
444 FIELD_DECLs that are virtual function table pointers
446 What things can this be used on:
451 @item DECL_CLASS_CONTEXT
452 Identifies the context that the _DECL was found in. For virtual function
453 tables, it points to the type associated with the virtual function
454 table. See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_FCONTEXT.
456 The difference between this and DECL_CONTEXT, is that for virtuals
470 DECL_CONTEXT (A::f) == A
471 DECL_CLASS_CONTEXT (A::f) == A
473 DECL_CONTEXT (B::f) == A
474 DECL_CLASS_CONTEXT (B::f) == B
479 RECORD_TYPEs, or UNION_TYPEs
481 What things can this be used on:
487 Identifies the context that the _DECL was found in. Can be used on
488 virtual function tables to find the type associated with the virtual
489 function table, but since they are FIELD_DECLs, DECL_FIELD_CONTEXT is a
490 better access method. Internally the same as DECL_FIELD_CONTEXT, so
491 don't us both. See also DECL_FIELD_CONTEXT, DECL_FCONTEXT and
499 What things can this be used on:
502 VAR_DECLs that are virtual function tables
507 @item DECL_FIELD_CONTEXT
508 Identifies the context that the FIELD_DECL was found in. Internally the
509 same as DECL_CONTEXT, so don't us both. See also DECL_CONTEXT,
510 DECL_FCONTEXT and DECL_CLASS_CONTEXT.
516 What things can this be used on:
519 FIELD_DECLs that are virtual function pointers
524 @item DECL_NESTED_TYPENAME
525 Holds the fully qualified type name. Example, Base::Derived.
531 What things can this be used on:
541 0 for things that don't have names
542 IDENTIFIER_NODEs for TYPE_DECLs
546 A bit that can be set to inform the debug information output routines in
547 the back-end that a certain _DECL node should be totally ignored.
549 Used in cases where it is known that the debugging information will be
550 output in another file, or where a sub-type is known not to be needed
551 because the enclosing type is not needed.
553 A compiler constructed virtual destructor in derived classes that do not
554 define an explicit destructor that was defined explicit in a base class
555 has this bit set as well. Also used on __FUNCTION__ and
556 __PRETTY_FUNCTION__ to mark they are ``compiler generated.'' c-decl and
557 c-lex.c both want DECL_IGNORED_P set for ``internally generated vars,''
558 and ``user-invisible variable.''
560 Functions built by the C++ front-end such as default destructors,
561 virtual destructors and default constructors want to be marked that
562 they are compiler generated, but unsure why.
564 Currently, it is used in an absolute way in the C++ front-end, as an
565 optimization, to tell the debug information output routines to not
566 generate debugging information that will be output by another separately
571 A flag used on FIELD_DECLs and VAR_DECLs. (Documentation in tree.h is
572 wrong.) Used in VAR_DECLs to indicate that the variable is a vtable.
573 It is also used in FIELD_DECLs for vtable pointers.
575 What things can this be used on:
577 FIELD_DECLs and VAR_DECLs
581 Used to point to the parent type of the vtable if there is one, else it
582 is just the type associated with the vtable. Because of the sharing of
583 virtual function tables that goes on, this slot is not very useful, and
584 is in fact, not used in the compiler at all. It can be removed.
586 What things can this be used on:
588 VAR_DECLs that are virtual function tables
592 RECORD_TYPEs maybe UNION_TYPEs
596 Used to find the first baseclass in which this FIELD_DECL is defined.
597 See also DECL_CONTEXT, DECL_FIELD_CONTEXT and DECL_CLASS_CONTEXT.
601 Used when writing out debugging information about vfield and
604 What things can this be used on:
606 FIELD_DECLs that are virtual function pointers
610 @item DECL_REFERENCE_SLOT
611 Used to hold the initialize for the reference.
613 What things can this be used on:
615 PARM_DECLs and VAR_DECLs that have a reference type
619 Used for FUNCTION_DECLs in two different ways. Before the structure
620 containing the FUNCTION_DECL is laid out, DECL_VINDEX may point to a
621 FUNCTION_DECL in a base class which is the FUNCTION_DECL which this
622 FUNCTION_DECL will replace as a virtual function. When the class is
623 laid out, this pointer is changed to an INTEGER_CST node which is
624 suitable to find an index into the virtual function table. See
625 get_vtable_entry as to how one can find the right index into the virtual
626 function table. The first index 0, of a virtual function table it not
627 used in the normal way, so the first real index is 1.
629 DECL_VINDEX may be a TREE_LIST, that would seem to be a list of
630 overridden FUNCTION_DECLs. add_virtual_function has code to deal with
631 this when it uses the variable base_fndecl_list, but it would seem that
632 somehow, it is possible for the TREE_LIST to pursist until method_call,
636 What things can this be used on:
641 @item DECL_SOURCE_FILE
642 Identifies what source file a particular declaration was found in.
646 "<built-in>" on TYPE_DECLs to mean the typedef is built in
649 @item DECL_SOURCE_LINE
650 Identifies what source line number in the source file the declaration
656 0 for an undefined label
658 0 for TYPE_DECLs that are internally generated
660 0 for FUNCTION_DECLs for functions generated by the compiler
661 (not yet, but should be)
663 0 for ``magic'' arguments to functions, that the user has no
675 @item TREE_ADDRESSABLE
676 A flag that is set for any type that has a constructor.
679 @item TREE_COMPLEXITY
680 They seem a kludge way to track recursion, poping, and pushing. They only
681 appear in cp-decl.c and cp-decl2.c, so the are a good candidate for
682 proper fixing, and removal.
686 Set for FIELD_DECLs by finish_struct. But not uniformly set.
688 The following routines do something with PRIVATE access:
689 build_method_call, alter_access, finish_struct_methods,
690 finish_struct, convert_to_aggr, CWriteLanguageDecl, CWriteLanguageType,
691 CWriteUseObject, compute_access, lookup_field, dfs_pushdecl,
692 GNU_xref_member, dbxout_type_fields, dbxout_type_method_1
696 The following routines do something with PROTECTED access:
697 build_method_call, alter_access, finish_struct, convert_to_aggr,
698 CWriteLanguageDecl, CWriteLanguageType, CWriteUseObject,
699 compute_access, lookup_field, GNU_xref_member, dbxout_type_fields,
704 Used to get the binfo for the type.
708 TREE_VECs that are binfos
710 What things can this be used on:
715 @item TYPE_BINFO_BASETYPES
716 See also BINFO_BASETYPES.
718 @item TYPE_BINFO_VIRTUALS
719 A unique list of functions for the virtual function table. See also
722 What things can this be used on:
727 @item TYPE_BINFO_VTABLE
728 Points to the virtual function table associated with the given type.
729 See also BINFO_VTABLE.
731 What things can this be used on:
737 VAR_DECLs that are virtual function tables
746 0 for things that don't have names.
747 should be IDENTIFIER_NODE for RECORD_TYPEs UNION_TYPEs and
749 TYPE_DECL for RECORD_TYPEs, UNION_TYPEs and ENUM_TYPEs, but
751 TYPE_DECL for typedefs, unsure why.
754 What things can one use this on:
765 It currently points to the TYPE_DECL for RECORD_TYPEs,
766 UNION_TYPEs and ENUM_TYPEs, but it should be history soon.
770 Synonym for @code{CLASSTYPE_METHOD_VEC}. Chained together with
771 @code{TREE_CHAIN}. @file{dbxout.c} uses this to get at the methods of a
776 Used to represent typedefs, and used to represent bindings layers.
780 DECL_NAME is the name of the typedef. For example, foo would
781 be found in the DECL_NAME slot when @code{typedef int foo;} is
784 DECL_SOURCE_LINE identifies what source line number in the
785 source file the declaration was found at. A value of 0
786 indicates that this TYPE_DECL is just an internal binding layer
787 marker, and does not correspond to a user supplied typedef.
792 A linked list (via @code{TREE_CHAIN}) of member types of a class. The
793 list can contain @code{TYPE_DECL}s, but there can also be other things
794 in the list apparently. See also @code{CLASSTYPE_TAGS}.
798 A flag used on a @code{FIELD_DECL} or a @code{VAR_DECL}, indicates it is
799 a virtual function table or a pointer to one. When used on a
800 @code{FUNCTION_DECL}, indicates that it is a virtual function. When
801 used on an @code{IDENTIFIER_NODE}, indicates that a function with this
802 same name exists and has been declared virtual.
804 When used on types, it indicates that the type has virtual functions, or
805 is derived from one that does.
807 Not sure if the above about virtual function tables is still true. See
808 also info on @code{DECL_VIRTUAL_P}.
810 What things can this be used on:
812 FIELD_DECLs, VAR_DECLs, FUNCTION_DECLs, IDENTIFIER_NODEs
815 @item VF_BASETYPE_VALUE
816 Get the associated type from the binfo that caused the given vfield to
817 exist. This is the least derived class (the most parent class) that
818 needed a virtual function table. It is probably the case that all uses
819 of this field are misguided, but they need to be examined on a
820 case-by-case basis. See history for more information on why the
821 previous statement was made.
823 Set at @code{finish_base_struct} time.
825 What things can this be used on:
827 TREE_LISTs that are vfields
831 This field was used to determine if a virtual function table's
832 slot should be filled in with a certain virtual function, by
833 checking to see if the type returned by VF_BASETYPE_VALUE was a
834 parent of the context in which the old virtual function existed.
835 This incorrectly assumes that a given type _could_ not appear as
836 a parent twice in a given inheritance lattice. For single
837 inheritance, this would in fact work, because a type could not
838 possibly appear more than once in an inheritance lattice, but
839 with multiple inheritance, a type can appear more than once.
843 Identifies the binfo that caused this vfield to exist. If this vfield
844 is from the first direct base class that has a virtual function table,
845 then VF_BINFO_VALUE is NULL_TREE, otherwise it will be the binfo of the
846 direct base where the vfield came from. Can use @code{TREE_VIA_VIRTUAL}
847 on result to find out if it is a virtual base class. Related to the
851 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
855 where @samp{t} is the type that has the given vfield.
858 get_binfo (VF_BASETYPE_VALUE (vfield), t, 0)
862 will return the binfo for the the given vfield.
864 May or may not be set at @code{modify_vtable_entries} time. Set at
865 @code{finish_base_struct} time.
867 What things can this be used on:
869 TREE_LISTs that are vfields
872 @item VF_DERIVED_VALUE
873 Identifies the type of the most derived class of the vfield, excluding
874 the the class this vfield is for.
876 Set at @code{finish_base_struct} time.
878 What things can this be used on:
880 TREE_LISTs that are vfields
883 @item VF_NORMAL_VALUE
884 Identifies the type of the most derived class of the vfield, including
885 the class this vfield is for.
887 Set at @code{finish_base_struct} time.
889 What things can this be used on:
891 TREE_LISTs that are vfields
894 @item WRITABLE_VTABLES
895 This is a option that can be defined when building the compiler, that
896 will cause the compiler to output vtables into the data segment so that
897 the vtables maybe written. This is undefined by default, because
898 normally the vtables should be unwritable. People that implement object
899 I/O facilities may, or people that want to change the dynamic type of
900 objects may want to have the vtables writable. Another way of achieving
901 this would be to make a copy of the vtable into writable memory, but the
902 drawback there is that that method only changes the type for one object.
906 @node Typical Behavior, Coding Conventions, Macros, Top
907 @section Typical Behavior
911 Whenever seemingly normal code fails with errors like
912 @code{syntax error at `\@{'}, it's highly likely that grokdeclarator is
913 returning a NULL_TREE for whatever reason.
915 @node Coding Conventions, Templates, Typical Behavior, Top
916 @section Coding Conventions
918 It should never be that case that trees are modified in-place by the
919 back-end, @emph{unless} it is guaranteed that the semantics are the same
920 no matter how shared the tree structure is. @file{fold-const.c} still
921 has some cases where this is not true, but rms hypothesizes that this
922 will never be a problem.
924 @node Templates, Access Control, Coding Conventions, Top
927 g++ uses the simple approach to instantiating templates: it blindly
928 generates the code for each instantiation as needed. For class
929 templates, g++ pushes the template parameters into the namespace for the
930 duration of the instantiation; for function templates, it's a simple
933 This approach does not support any of the template definition-time error
934 checking that is being bandied about by X3J16. It makes no attempt to deal
935 with name binding in a consistent way.
937 Instantiation of a class template is triggered by the use of a template
938 class anywhere but in a straight declaration like @code{class A<int>}.
939 This is wrong; in fact, it should not be triggered by typedefs or
940 declarations of pointers. Now that explicit instantiation is supported,
941 this misfeature is not necessary.
946 @item instantiate_class_template
950 @node Access Control, Error Reporting, Templates, Top
951 @section Access Control
952 The function compute_access returns one of three values:
956 means that the field can be accessed by the current lexical scope.
958 @item access_protected
959 means that the field cannot be accessed by the current lexical scope
960 because it is protected.
963 means that the field cannot be accessed by the current lexical scope
964 because it is private.
967 DECL_ACCESS is used for access declarations; alter_access creates a list
968 of types and accesses for a given decl.
970 Formerly, DECL_@{PUBLIC,PROTECTED,PRIVATE@} corresponded to the return
971 codes of compute_access and were used as a cache for compute_access.
972 Now they are not used at all.
974 TREE_PROTECTED and TREE_PRIVATE are used to record the access levels
975 granted by the containing class. BEWARE: TREE_PUBLIC means something
976 completely unrelated to access control!
978 @node Error Reporting, Parser, Access Control, Top
979 @section Error Reporting
981 The C++ front-end uses a call-back mechanism to allow functions to print
982 out reasonable strings for types and functions without putting extra
983 logic in the functions where errors are found. The interface is through
984 the @code{cp_error} function (or @code{cp_warning}, etc.). The
985 syntax is exactly like that of @code{error}, except that a few more
986 conversions are supported:
990 %C indicates a value of `enum tree_code'.
992 %D indicates a *_DECL node.
994 %E indicates a *_EXPR node.
996 %L indicates a value of `enum languages'.
998 %P indicates the name of a parameter (i.e. "this", "1", "2", ...)
1000 %T indicates a *_TYPE node.
1002 %O indicates the name of an operator (MODIFY_EXPR -> "operator =").
1006 There is some overlap between these; for instance, any of the node
1007 options can be used for printing an identifier (though only @code{%D}
1008 tries to decipher function names).
1010 For a more verbose message (@code{class foo} as opposed to just @code{foo},
1011 including the return type for functions), use @code{%#c}.
1012 To have the line number on the error message indicate the line of the
1013 DECL, use @code{cp_error_at} and its ilk; to indicate which argument you want,
1014 use @code{%+D}, or it will default to the first.
1016 @node Parser, Copying Objects, Error Reporting, Top
1019 Some comments on the parser:
1021 The @code{after_type_declarator} / @code{notype_declarator} hack is
1022 necessary in order to allow redeclarations of @code{TYPENAME}s, for
1032 In the above, the first @code{foo} is parsed as a @code{notype_declarator},
1033 and the second as a @code{after_type_declarator}.
1037 There are currently four reduce/reduce ambiguities in the parser. They are:
1039 1) Between @code{template_parm} and
1040 @code{named_class_head_sans_basetype}, for the tokens @code{aggr
1041 identifier}. This situation occurs in code looking like
1044 template <class T> class A @{ @};
1047 It is ambiguous whether @code{class T} should be parsed as the
1048 declaration of a template type parameter named @code{T} or an unnamed
1049 constant parameter of type @code{class T}. Section 14.6, paragraph 3 of
1050 the January '94 working paper states that the first interpretation is
1051 the correct one. This ambiguity results in two reduce/reduce conflicts.
1053 2) Between @code{primary} and @code{type_id} for code like @samp{int()}
1054 in places where both can be accepted, such as the argument to
1055 @code{sizeof}. Section 8.1 of the pre-San Diego working paper specifies
1056 that these ambiguous constructs will be interpreted as @code{typename}s.
1057 This ambiguity results in six reduce/reduce conflicts between
1058 @samp{absdcl} and @samp{functional_cast}.
1060 3) Between @code{functional_cast} and
1061 @code{complex_direct_notype_declarator}, for various token strings.
1062 This situation occurs in code looking like
1068 This code is ambiguous; it could be a declaration of the variable
1069 @samp{a} as a pointer to @samp{int}, or it could be a functional cast of
1070 @samp{*a} to @samp{int}. Section 6.8 specifies that the former
1071 interpretation is correct. This ambiguity results in 7 reduce/reduce
1072 conflicts. Another aspect of this ambiguity is code like 'int (x[2]);',
1073 which is resolved at the '[' and accounts for 6 reduce/reduce conflicts
1074 between @samp{direct_notype_declarator} and
1075 @samp{primary}/@samp{overqualified_id}. Finally, there are 4 r/r
1076 conflicts between @samp{expr_or_declarator} and @samp{primary} over code
1077 like 'int (a);', which could probably be resolved but would also
1078 probably be more trouble than it's worth. In all, this situation
1079 accounts for 17 conflicts. Ack!
1081 The second case above is responsible for the failure to parse 'LinppFile
1082 ppfile (String (argv[1]), &outs, argc, argv);' (from Rogue Wave
1083 Math.h++) as an object declaration, and must be fixed so that it does
1084 not resolve until later.
1086 4) Indirectly between @code{after_type_declarator} and @code{parm}, for
1087 type names. This occurs in (as one example) code like
1090 typedef int foo, bar;
1096 What is @code{bar} inside the class definition? We currently interpret
1097 it as a @code{parm}, as does Cfront, but IBM xlC interprets it as an
1098 @code{after_type_declarator}. I believe that xlC is correct, in light
1099 of 7.1p2, which says "The longest sequence of @i{decl-specifiers} that
1100 could possibly be a type name is taken as the @i{decl-specifier-seq} of
1101 a @i{declaration}." However, it seems clear that this rule must be
1102 violated in the case of constructors. This ambiguity accounts for 8
1105 Unlike the others, this ambiguity is not recognized by the Working Paper.
1107 @node Copying Objects, Exception Handling, Parser, Top
1108 @section Copying Objects
1110 The generated copy assignment operator in g++ does not currently do the
1111 right thing for multiple inheritance involving virtual bases; it just
1112 calls the copy assignment operators for its direct bases. What it
1113 should probably do is:
1115 1) Split up the copy assignment operator for all classes that have
1116 vbases into "copy my vbases" and "copy everything else" parts. Or do
1117 the trickiness that the constructors do to ensure that vbases don't get
1118 initialized by intermediate bases.
1120 2) Wander through the class lattice, find all vbases for which no
1121 intermediate base has a user-defined copy assignment operator, and call
1122 their "copy everything else" routines. If not all of my vbases satisfy
1123 this criterion, warn, because this may be surprising behavior.
1125 3) Call the "copy everything else" routine for my direct bases.
1127 If we only have one direct base, we can just foist everything off onto
1130 This issue is currently under discussion in the core reflector
1133 @node Exception Handling, Free Store, Copying Objects, Top
1134 @section Exception Handling
1136 Note, exception handling in g++ is still under development.
1138 This section describes the mapping of C++ exceptions in the C++
1139 front-end, into the back-end exception handling framework.
1141 The basic mechanism of exception handling in the back-end is
1142 unwind-protect a la elisp. This is a general, robust, and language
1143 independent representation for exceptions.
1145 The C++ front-end exceptions are mapping into the unwind-protect
1146 semantics by the C++ front-end. The mapping is describe below.
1148 When -frtti is used, rtti is used to do exception object type checking,
1149 when it isn't used, the encoded name for the type of the object being
1150 thrown is used instead. All code that originates exceptions, even code
1151 that throws exceptions as a side effect, like dynamic casting, and all
1152 code that catches exceptions must be compiled with either -frtti, or
1153 -fno-rtti. It is not possible to mix rtti base exception handling
1154 objects with code that doesn't use rtti. The exceptions to this, are
1155 code that doesn't catch or throw exceptions, catch (...), and code that
1156 just rethrows an exception.
1158 Currently we use the normal mangling used in building functions names
1159 (int's are "i", const char * is PCc) to build the non-rtti base type
1160 descriptors for exception handling. These descriptors are just plain
1161 NULL terminated strings, and internally they are passed around as char
1164 In C++, all cleanups should be protected by exception regions. The
1165 region starts just after the reason why the cleanup is created has
1166 ended. For example, with an automatic variable, that has a constructor,
1167 it would be right after the constructor is run. The region ends just
1168 before the finalization is expanded. Since the backend may expand the
1169 cleanup multiple times along different paths, once for normal end of the
1170 region, once for non-local gotos, once for returns, etc, the backend
1171 must take special care to protect the finalization expansion, if the
1172 expansion is for any other reason than normal region end, and it is
1173 `inline' (it is inside the exception region). The backend can either
1174 choose to move them out of line, or it can created an exception region
1175 over the finalization to protect it, and in the handler associated with
1176 it, it would not run the finalization as it otherwise would have, but
1177 rather just rethrow to the outer handler, careful to skip the normal
1178 handler for the original region.
1180 In Ada, they will use the more runtime intensive approach of having
1181 fewer regions, but at the cost of additional work at run time, to keep a
1182 list of things that need cleanups. When a variable has finished
1183 construction, they add the cleanup to the list, when the come to the end
1184 of the lifetime of the variable, the run the list down. If the take a
1185 hit before the section finishes normally, they examine the list for
1186 actions to perform. I hope they add this logic into the back-end, as it
1187 would be nice to get that alternative approach in C++.
1189 On an rs6000, xlC stores exception objects on that stack, under the try
1190 block. When is unwinds down into a handler, the frame pointer is
1191 adjusted back to the normal value for the frame in which the handler
1192 resides, and the stack pointer is left unchanged from the time at which
1193 the object was thrown. This is so that there is always someplace for
1194 the exception object, and nothing can overwrite it, once we start
1195 throwing. The only bad part, is that the stack remains large.
1197 The below points out some flaws in g++'s exception handling, as it now
1200 Only exact type matching or reference matching of throw types works when
1201 -fno-rtti is used. Only works on a SPARC (like Suns), i386, arm and
1202 rs6000 machines. Partial support is also in for alpha, hppa, m68k and
1203 mips machines, but a stack unwinder called __unwind_function has to be
1204 written, and added to libgcc2 for them. See below for details on
1205 __unwind_function. All completely constructed temps and local variables
1206 are cleaned up in all unwinded scopes. Completed parts of partially
1207 constructed objects are cleaned up with the exception that partially
1208 built arrays are not cleaned up as required. Don't expect exception
1209 handling to work right if you optimize, in fact the compiler will
1210 probably core dump. If two EH regions are the exact same size, the
1211 backend cannot tell which one is first. It punts by picking the last
1212 one, if they tie. This is usually right. We really should stick in a
1213 nop, if they are the same size.
1215 When we invoke the copy constructor for an exception object because it
1216 is passed by value, and if we take a hit (exception) inside the copy
1217 constructor someplace, where do we go? I have tentatively choosen to
1218 not catch throws by the outer block at the same unwind level, if one
1219 exists, but rather to allow the frame to unwind into the next series of
1220 handlers, if any. If this is the wrong way to do it, we will need to
1221 protect the rest of the handler in some fashion. Maybe just changing
1222 the handler's handler to protect the whole series of handlers is the
1223 right way to go. This part is wrong. We should call terminate if an
1224 exception is thrown while doing things like trying to copy the exception
1227 Exception specifications are handled syntax wise, but not semantic wise.
1228 build_exception_variant should sort the incoming list, so that is
1229 implements set compares, not exact list equality. Type smashing should
1230 smash exception specifications using set union.
1232 Thrown objects are allocated on the heap, in the usual way, but they are
1233 never deleted. They should be deleted by the catch clauses. If one
1234 runs out of heap space, throwing an object will probably never work.
1235 This could be relaxed some by passing an __in_chrg parameter to track
1236 who has control over the exception object.
1238 When the backend returns a value, it can create new exception regions
1239 that need protecting. The new region should rethrow the object in
1240 context of the last associated cleanup that ran to completion.
1242 The __unwind_function takes a pointer to the throw handler, and is
1243 expected to pop the stack frame that was built to call it, as well as
1244 the frame underneath and then jump to the throw handler. It must not
1245 change the three registers allocated for the pointer to the exception
1246 object, the pointer to the type descriptor that identifies the type of
1247 the exception object, and the pointer to the code that threw. On hppa,
1248 these are %r5, %r6, %r7. On m68k these are a2, a3, a4. On mips they
1249 are s0, s1, s2. On Alpha these are $9, $10, $11. It takes about a day
1250 to write this routine, if someone wants to volunteer to write this
1251 routine for any architecture, exception support for that architecture
1252 will be added to g++. Please send in those code donations.
1255 The backend must be extended to fully support exceptions. Right now
1256 there are a few hooks into the alpha exception handling backend that
1257 resides in the C++ frontend from that backend that allows exception
1258 handling to work in g++. An exception region is a segment of generated
1259 code that has a handler associated with it. The exception regions are
1260 denoted in the generated code as address ranges denoted by a starting PC
1261 value and an ending PC value of the region. Some of the limitations
1262 with this scheme are:
1266 The backend replicates insns for such things as loop unrolling and
1267 function inlining. Right now, there are no hooks into the frontend's
1268 exception handling backend to handle the replication of insns. When
1269 replication happens, a new exception region descriptor needs to be
1270 generated for the new region.
1273 The backend expects to be able to rearrange code, for things like jump
1274 optimization. Any rearranging of the code needs have exception region
1275 descriptors updated appropriately.
1278 The backend can eliminate dead code. Any associated exception region
1279 descriptor that refers to fully contained code that has been eliminated
1280 should also be removed, although not doing this is harmless in terms of
1285 The above is not meant to be exhaustive, but does include all things I
1286 have thought of so far. I am sure other limitations exist.
1288 @node Free Store, Concept Index, Exception Handling, Top
1291 operator new [] adds a magic cookie to the beginning of arrays for which
1292 the number of elements will be needed by operator delete []. These are
1293 arrays of objects with destructors and arrays of objects that define
1294 operator delete [] with the optional size_t argument. This cookie can
1295 be examined from a program as follows:
1298 typedef unsigned long size_t;
1299 extern "C" int printf (const char *, ...);
1301 size_t nelts (void *p)
1304 size_t nelts __attribute__ ((aligned (sizeof (double))));
1307 cookie *cp = (cookie *)p;
1320 printf ("%ld\n", nelts (ap));
1325 The linkage code in g++ is horribly twisted in order to meet two design goals:
1327 1) Avoid unnecessary emission of inlines and vtables.
1329 2) Support pedantic assemblers like the one in AIX.
1331 To meet the first goal, we defer emission of inlines and vtables until
1332 the end of the translation unit, where we can decide whether or not they
1333 are needed, and how to emit them if they are.
1335 @node Concept Index, , Free Store, Top
1336 @section Concept Index