gcc/PROJECTS

   1 C++ template friend functions (mmitchell@usa.net)
   2
   3 Haifa scheduler (haifa-sched.c, loop.[ch], unroll.[ch], genattrtab.c):
   4 (contact law@cygnus.com before starting any serious haifa work)
   5
   6   * Fix all the formatting problems.  Simple, mindless work.
   7
   8   * Fix/add comments throughout the code.  Many of the comments are from
   9   the old scheduler and are out of date and misleading.  Many new hunks
  10   of code don't have sufficient comments and documentation.  Those which
  11   do have comments need to be rewritten to use complete sentences and
  12   proper formatting.
  13
  14   * Someone needs make one (or more) passes over the scheduler as a whole to
  15   just clean it up.  Try to move the machine dependent bits into the target
  16   files where they belong, avoid re-creating functions where or near
  17   equivalents already exist (ie is_conditional_branch and friends), etc., etc.
  18
  19   * Document the new scheduling options.  Remove those options which are
  20   not really useful (like reverse scheduling for example).  In general
  21   the haifa scheduler adds _way_ too many options.  I'm definitely of the
  22   opinion that gcc already has too many -foptions, and haifa doesn't help
  23   that situation.
  24
  25   * Testing and benchmarking.  Haifa has received little testing inside
  26   Cygnus -- it needs to be throughly tested on a wide variety of platforms
  27   which benefit from instruction scheduling (sparc, alpha, pa, ppc, mips, x86,
  28   i960, m88k, sh, etc).    It needs to be benchmarked -- my tests showed
  29   haifa was very much a hit or miss in terms of performance improvements.
  30
  31   Some benchmarks ran significantly fasters, other significantly slower.
  32   We need to work on making haifa generate better overall code.
  33
  34   We need to have some kind of docs for how to best describe a machine to
  35   the haifa scheduler to get good performance.  Some existing ports have
  36   been tuned to deal with the old scheduler -- they may need to be tuned
  37   to generate good schedules with haifa.
  38
  39
  40
  41
  42 -------------
  43
  44 The old PROJECTS file.  Stuff I know has been done has been deleted.
  45 Stuff in progress has a contact name associated with it.
  46 has been
  47
  48 1. Better optimization.
  49
  50 * Constants in unused inline functions
  51
  52 It would be nice to delay output of string constants so that string
  53 constants mentioned in unused inline functions are never generated.
  54 Perhaps this would also take care of string constants in dead code.
  55
  56 The difficulty is in finding a clean way for the RTL which refers
  57 to the constant (currently, only by an assembler symbol name)
  58 to point to the constant and cause it to be output.
  59
  60 * More cse
  61
  62 The techniques for doing full global cse are described in the red
  63 dragon book, or (a different version) in Frederick Chow's thesis from
  64 Stanford.  It is likely to be slow and use a lot of memory, but it
  65 might be worth offering as an additional option.  Contact dje@cygnus.com
  66 before doing any work on CSE.
  67
  68 * Optimize a sequence of if statements whose conditions are exclusive.
  69
  70 It is possible to optimize
  71
  72     if (x == 1) ...;
  73     if (x == 2) ...;
  74     if (x == 3) ...;
  75
  76 into
  77
  78     if (x == 1) ...;
  79     else if (x == 2) ...;
  80     else if (x == 3) ...;
  81
  82 provided that x is not altered by the contents of the if statements.
  83
  84 It's not certain whether this is worth doing.  Perhaps programmers
  85 nearly always write the else's themselves, leaving few opportunities
  86 to improve anything.
  87
  88 * Un-cse.
  89
  90 Perhaps we should have an un-cse step right after cse, which tries to
  91 replace a reg with its value if the value can be substituted for the
  92 reg everywhere, if that looks like an improvement.  Which is if the
  93 reg is used only a few times.  Use rtx_cost to determine if the
  94 change is really an improvement.
  95
  96 * Clean up how cse works.
  97
  98 The scheme is that each value has just one hash entry.  The
  99 first_same_value and next_same_value chains are no longer needed.
 100
 101 For arithmetic, each hash table elt has the following slots:
 102
 103 * Operation.  This is an rtx code.
 104 * Mode.
 105 * Operands 0, 1 and 2.  These point to other hash table elements.
 106
 107 So, if we want to enter (PLUS:SI (REG:SI 30) (CONST_INT 104)), we
 108 first enter (CONST_INT 104) and find the entry that (REG:SI 30) now
 109 points to.  Then we put these elts into operands 0 and 1 of a new elt.
 110 We put PLUS and SI into the new elt.
 111
 112 Registers and mem refs would never be entered into the table as such.
 113 However, the values they contain would be entered.  There would be a
 114 table indexed by regno which points at the hash entry for the value in
 115 that reg.
 116
 117 The hash entry index now plays the role of a qty number.
 118 We still need qty_first_reg, reg_next_eqv, etc. to record which regs
 119 share a particular qty.
 120
 121 When a reg is used whose contents are unknown, we need to create a
 122 hash table entry whose contents say "unknown", as a place holder for
 123 whatever the reg contains.  If that reg is added to something, then
 124 the hash entry for the sum will refer to the "unknown" entry.  Use
 125 UNKNOWN for the rtx code in this entry.  This replaces make_new_qty.
 126
 127 For a constant, a unique hash entry would be made based on the
 128 value of the constant.
 129
 130 What about MEM?  Each time a memory address is referenced, we need a
 131 qty (a hash table elt) to represent what is in it.  (Just as for a
 132 register.)  If this isn't known, create one, just as for a reg whose
 133 contents are unknown.
 134
 135 We need a way to find all mem refs that still contain a certain value.
 136 Do this with a chain of hash elts (for memory addresses) that point to
 137 locations that hold the value.  The hash elt for the value itself should
 138 point to the start of the chain.  It would be good for the hash elt
 139 for an address to point to the hash elt for the contents of that address
 140 (but this ptr can be null if the contents have never been entered).
 141
 142 With this data structure, nothing need ever be invalidated except
 143 the lists of which regs or mems hold a particular value.  It is easy
 144 to see if there is a reg or mem that is equiv to a particular value.
 145 If the value is constant, it is always explicitly constant.
 146
 147 * Support more general tail-recursion among different functions.
 148
 149 This might be possible under certain circumstances, such as when
 150 the argument lists of the functions have the same lengths.
 151 Perhaps it could be done with a special declaration.
 152
 153 You would need to verify in the calling function that it does not
 154 use the addresses of any local variables and does not use setjmp.
 155
 156 * Put short statics vars at low addresses and use short addressing mode?
 157
 158 Useful on the 68000/68020 and perhaps on the 32000 series,
 159 provided one has a linker that works with the feature.
 160 This is said to make a 15% speedup on the 68000.
 161
 162 * Keep global variables in registers.
 163
 164 Here is a scheme for doing this.  A global variable, or a local variable
 165 whose address is taken, can be kept in a register for an entire function
 166 if it does not use non-constant memory addresses and (for globals only)
 167 does not call other functions.  If the entire function does not meet
 168 this criterion, a loop may.
 169
 170 The VAR_DECL for such a variable would have to have two RTL expressions:
 171 the true home in memory, and the pseudo-register used temporarily.
 172 It is necessary to emit insns to copy the memory location into the
 173 pseudo-register at the beginning of the function or loop, and perhaps
 174 back out at the end.  These insns should have REG_EQUIV notes so that,
 175 if the pseudo-register does not get a hard register, it is spilled into
 176 the memory location which exists in any case.
 177
 178 The easiest way to set up these insns is to modify the routine
 179 put_var_into_stack so that it does not apply to the entire function
 180 (sparing any loops which contain nothing dangerous) and to call it at
 181 the end of the function regardless of where in the function the
 182 address of a local variable is taken.  It would be called
 183 unconditionally at the end of the function for all relevant global
 184 variables.
 185
 186 For debugger output, the thing to do is to invent a new binding level
 187 around the appropriate loop and define the variable name as a register
 188 variable with that scope.
 189
 190 * Live-range splitting.
 191
 192 Currently a variable is allocated a hard register either for the full
 193 extent of its use or not at all.  Sometimes it would be good to
 194 allocate a variable a hard register for just part of a function; for
 195 example, through a particular loop where the variable is mostly used,
 196 or outside of a particular loop where the variable is not used.  (The
 197 latter is nice because it might let the variable be in a register most
 198 of the time even though the loop needs all the registers.)
 199
 200 Contact meissner@cygnus.com before starting any work on live range
 201 splitting.
 202
 203 * Detect dead stores into memory?
 204
 205 A store into memory is dead if it is followed by another store into
 206 the same location; and, in between, there is no reference to anything
 207 that might be that location (including no reference to a variable
 208 address).
 209
 210 * Loop optimization.
 211
 212 Strength reduction and iteration variable elimination could be
 213 smarter.  They should know how to decide which iteration variables are
 214 not worth making explicit because they can be computed as part of an
 215 address calculation.  Based on this information, they should decide
 216 when it is desirable to eliminate one iteration variable and create
 217 another in its place.
 218
 219 It should be possible to compute what the value of an iteration
 220 variable will be at the end of the loop, and eliminate the variable
 221 within the loop by computing that value at the loop end.
 222
 223 When a loop has a simple increment that adds 1,
 224 instead of jumping in after the increment,
 225 decrement the loop count and jump to the increment.
 226 This allows aob insns to be used.
 227
 228 * Using constraints on values.
 229
 230 Many operations could be simplified based on knowledge of the
 231 minimum and maximum possible values of a register at any particular time.
 232 These limits could come from the data types in the tree, via rtl generation,
 233 or they can be deduced from operations that are performed.  For example,
 234 the result of an `and' operation one of whose operands is 7 must be in
 235 the range 0 to 7.  Compare instructions also tell something about the
 236 possible values of the operand, in the code beyond the test.
 237
 238 Value constraints can be used to determine the results of a further
 239 comparison.  They can also indicate that certain `and' operations are
 240 redundant.  Constraints might permit a decrement and branch
 241 instruction that checks zeroness to be used when the user has
 242 specified to exit if negative.
 243
 244 * Smarter reload pass.
 245
 246 The reload pass as currently written can reload values only into registers
 247 that are reserved for reloading.  This means that in order to use a
 248 register for reloading it must spill everything out of that register.
 249
 250 It would be straightforward, though complicated, for reload1.c to keep
 251 track, during its scan, of which hard registers were available at each
 252 point in the function, and use for reloading even registers that were
 253 free only at the point they were needed.  This would avoid much spilling
 254 and make better code.
 255
 256 * Change the type of a variable.
 257
 258 Sometimes a variable is declared as `int', it is assigned only once
 259 from a value of type `char', and then it is used only by comparison
 260 against constants.  On many machines, better code would result if
 261 the variable had type `char'.  If the compiler could detect this
 262 case, it could change the declaration of the variable and change
 263 all the places that use it.
 264
 265 * Better handling for very sparse switches.
 266
 267 There may be cases where it would be better to compile a switch
 268 statement to use a fixed hash table rather than the current
 269 combination of jump tables and binary search.
 270
 271 * Order of subexpressions.
 272
 273 It might be possible to make better code by paying attention
 274 to the order in which to generate code for subexpressions of an expression.
 275
 276 * More code motion.
 277
 278 Consider hoisting common code up past conditional branches or
 279 tablejumps.
 280
 281 * Trace scheduling.
 282
 283 This technique is said to be able to figure out which way a jump
 284 will usually go, and rearrange the code to make that path the
 285 faster one.
 286
 287 * Distributive law.
 288
 289 The C expression *(X + 4 * (Y + C)) compiles better on certain
 290 machines if rewritten as *(X + 4*C + 4*Y) because of known addressing
 291 modes.  It may be tricky to determine when, and for which machines, to
 292 use each alternative.
 293
 294 Some work has been done on this, in combine.c.
 295
 296 * Can optimize by changing if (x) y; else z; into z; if (x) y;
 297 if z and x do not interfere and z has no effects not undone by y.
 298 This is desirable if z is faster than jumping.
 299
 300 * For a two-insn loop on the 68020, such as
 301   foo:  movb    a2@+,a3@+
 302         jne     foo
 303 it is better to insert dbeq d0,foo before the jne.
 304 d0 can be a junk register.  The challenge is to fit this into
 305 a portable framework: when can you detect this situation and
 306 still be able to allocate a junk register?
 307
 308 2. Simpler porting.
 309
 310 Right now, describing the target machine's instructions is done
 311 cleanly, but describing its addressing mode is done with several
 312 ad-hoc macro definitions.  Porting would be much easier if there were
 313 an RTL description for addressing modes like that for instructions.
 314 Tools analogous to genflags and genrecog would generate macros from
 315 this description.
 316
 317 There would be one pattern in the address-description file for each
 318 kind of addressing, and this pattern would have:
 319
 320   * the RTL expression for the address
 321   * C code to verify its validity (since that may depend on
 322     the exact data).
 323   * C code to print the address in assembler language.
 324   * C code to convert the address into a valid one, if it is not valid.
 325     (This would replace LEGITIMIZE_ADDRESS).
 326   * Register constraints for all indeterminates that appear
 327     in the RTL expression.
 328
 329 3. Other languages.
 330
 331 Front ends for Pascal, Fortran, Algol, Cobol, Modula-2 and Ada are
 332 desirable.
 333
 334 Pascal, Modula-2 and Ada require the implementation of functions
 335 within functions.  Some of the mechanisms for this already exist.
 336
 337 4. More extensions.
 338
 339 * Generated unique labels.  Have some way of generating distinct labels
 340 for use in extended asm statements.  I don't know what a good syntax would
 341 be.
 342
 343 * A way of defining a structure containing a union, in which the choice of
 344 union alternative is controlled by a previous structure component.
 345
 346 Here is a possible syntax for this.
 347
 348 struct foo {
 349   enum { INT, DOUBLE } code;
 350   auto union { case INT: int i; case DOUBLE: double d;} value : code;
 351 };
 352
 353 * Allow constructor expressions as lvalues, like this:
 354
 355         (struct foo) {a, b, c} = foo();
 356
 357 This would call foo, which returns a structure, and then store the
 358 several components of the structure into the variables a, b, and c.
 359
 360 5. Generalize the machine model.
 361
 362 * Some new compiler features may be needed to do a good job on machines
 363 where static data needs to be addressed using base registers.
 364
 365 * Some machines have two stacks in different areas of memory, one used
 366 for scalars and another for large objects.  The compiler does not
 367 now have a way to understand this.
 368
 369 6. Useful warnings.
 370
 371 * Warn about statements that are undefined because the order of
 372 evaluation of increment operators makes a big difference.  Here is an
 373 example:
 374
 375     *foo++ = hack (*foo);
 376
 377 7. Better documentation of how GCC works and how to port it.
 378
 379 Here is an outline proposed by Allan Adler.
 380
 381 I.    Overview of this document
 382 II.   The machines on which GCC is implemented
 383     A. Prose description of those characteristics of target machines and
 384        their operating systems which are pertinent to the implementation
 385        of GCC.
 386         i. target machine characteristics
 387         ii. comparison of this system of machine characteristics with
 388             other systems of machine specification currently in use
 389     B. Tables of the characteristics of the target machines on which
 390        GCC is implemented.
 391     C. A priori restrictions on the values of characteristics of target
 392        machines, with special reference to those parts of the source code
 393        which entail those restrictions
 394         i. restrictions on individual characteristics
 395         ii. restrictions involving relations between various characteristics
 396     D. The use of GCC as a cross-compiler
 397         i. cross-compilation to existing machines
 398         ii. cross-compilation to non-existent machines
 399     E. Assumptions which are made regarding the target machine
 400         i.  assumptions regarding the architecture of the target machine
 401         ii. assumptions regarding the operating system of the target machine
 402         iii. assumptions regarding software resident on the target machine
 403         iv. where in the source code these assumptions are in effect made
 404 III.   A systematic approach to writing the files tm.h and xm.h
 405     A. Macros which require special care or skill
 406     B. Examples, with special reference to the underlying reasoning
 407 IV.    A systematic approach to writing the machine description file md
 408     A. Minimal viable sets of insn descriptions
 409     B. Examples, with special reference to the underlying reasoning
 410 V.     Uses of the file aux-output.c
 411 VI.    Specification of what constitutes correct performance of an
 412        implementation of GCC
 413     A. The components of GCC
 414     B. The itinerary of a C program through GCC
 415     C. A system of benchmark programs
 416     D. What your RTL and assembler should look like with these benchmarks
 417     E. Fine tuning for speed and size of compiled code
 418 VII.   A systematic procedure for debugging an implementation of GCC
 419     A. Use of GDB
 420         i. the macros in the file .gdbinit for GCC
 421         ii. obstacles to the use of GDB
 422             a. functions implemented as macros can't be called in GDB
 423     B. Debugging without GDB
 424         i. How to turn off the normal operation of GCC and access specific
 425            parts of GCC
 426     C. Debugging tools
 427     D. Debugging the parser
 428         i. how machine macros and insn definitions affect the parser
 429     E. Debugging the recognizer
 430         i. how machine macros and insn definitions affect the recognizer
 431
 432 ditto for other components
 433
 434 VIII. Data types used by GCC, with special reference to restrictions not
 435       specified in the formal definition of the data type
 436 IX.   References to the literature for the algorithms used in GCC
 437