* md.texi (Looping Patterns): New node.

[pf3gnuchains/gcc-fork.git] / gcc / md.texi
diff --git a/gcc/md.texi b/gcc/md.texi

index 5ab116a..27b57dc 100644 (file)
--- a/gcc/md.texi
+++ b/gcc/md.texi
@@ -32,6 +32,7 @@ See the next chapter for information on the C header file.
  * Pattern Ordering::    When the order of patterns makes a difference.
  * Dependent Patterns::  Having one pattern may make you need another.
  * Jump Patterns::       Special considerations for patterns for jump insns.
+* Looping Patterns::    How to define patterns for special looping insns.
  * Insn Canonicalizations::Canonicalization of Instructions
  * Expander Definitions::Generating a sequence of several RTL insns
                            for a standard operation.
@@ -2597,6 +2598,47 @@ table it uses.  Its assembler code normally has no need to use the
  second operand, but you should incorporate it in the RTL pattern so
  that the jump optimizer will not delete the table as unreachable code.
  
+
+@cindex @code{decrement_and_branch_until_zero} instruction pattern
+@item @samp{decrement_and_branch_until_zero}
+Conditional branch instruction that decrements a register and
+jumps if the register is non-zero.  Operand 0 is the register to
+decrement and test; operand 1 is the label to jump to if the
+register is non-zero.  @xref{Looping Patterns}
+
+This optional instruction pattern is only used by the combiner,
+typically for loops reversed by the loop optimizer when strength
+reduction is enabled.
+
+@cindex @code{doloop_end} instruction pattern
+@item @samp{doloop_end}
+Conditional branch instruction that decrements a register and jumps if
+the register is non-zero.  This instruction takes five operands: Operand
+0 is the register to decrement and test; operand 1 is the number of loop
+iterations as a @code{const_int} or @code{const0_rtx} if this cannot be
+determined until run-time; operand 2 is the actual or estimated maximum
+number of iterations as a @code{const_int}; operand 3 is the number of
+enclosed loops as a @code{const_int} (an innermost loop has a value of
+1); operand 4 is the label to jump to if the register is non-zero.
+@xref{Looping Patterns}
+
+This optional instruction pattern should be defined for machines with
+low-overhead looping instructions as the loop optimizer will try to
+modify suitable loops to utilize it.  If nested low-overhead looping is
+not supported, use a @code{define_expand} (@pxref{Expander Definitions})
+and make the pattern fail if operand 3 is not @code{const1_rtx}.
+Similarly, if the actual or estimated maximum number of iterations is
+too large for this instruction, make it fail.
+
+@cindex @code{doloop_begin} instruction pattern
+@item @samp{doloop_begin}
+Companion instruction to @code{doloop_end} required for machines that
+need to perform some initialisation, such as loading special registers
+used by a low-overhead looping instruction.  If initialisation insns do
+not always need to be emitted, use a @code{define_expand}
+(@pxref{Expander Definitions}) and make it fail.
+
+
  @cindex @code{canonicalize_funcptr_for_compare} instruction pattern
  @item @samp{canonicalize_funcptr_for_compare}
  Canonicalize the function pointer in operand 1 and store the result
@@ -3084,6 +3126,111 @@ discussed above, we have the pattern
  The @code{SELECT_CC_MODE} macro on the Sparc returns @code{CC_NOOVmode}
  for comparisons whose argument is a @code{plus}.
  
+@node Looping Patterns
+@section Defining Looping Instruction Patterns
+@cindex looping instruction patterns
+@cindex defining looping instruction patterns
+
+Some machines have special jump instructions that can be utilised to
+make loops more efficient.  A common example is the 68000 @samp{dbra}
+instruction which performs a decrement of a register and a branch if the
+result was greater than zero.  Other machines, in particular digital
+signal processors (DSPs), have special block repeat instructions to
+provide low-overhead loop support.  For example, the TI TMS320C3x/C4x
+DSPs have a block repeat instruction that loads special registers to
+mark the top and end of a loop and to count the number of loop
+iterations.  This avoids the need for fetching and executing a
+@samp{dbra}-like instruction and avoids pipeline stalls asociated with
+the jump.
+
+GNU CC has three special named patterns to support low overhead looping,
+@samp{decrement_and_branch_until_zero}, @samp{doloop_begin}, and
+@samp{doloop_end}.  The first pattern,
+@samp{decrement_and_branch_until_zero}, is not emitted during RTL
+generation but may be emitted during the instruction combination phase.
+This requires the assistance of the loop optimizer, using information
+collected during strength reduction, to reverse a loop to count down to
+zero.  Some targets also require the loop optimizer to add a
+@code{REG_NONNEG} note to indicate that the iteration count is always
+positive.  This is needed if the target performs a signed loop
+termination test.  For example, the 68000 uses a pattern similar to the
+following for its @code{dbra} instruction:
+
+@smallexample
+@group
+(define_insn "decrement_and_branch_until_zero"
+  [(set (pc)
+       (if_then_else
+         (ge (plus:SI (match_operand:SI 0 "general_operand" "+d*am")
+                      (const_int -1))
+             (const_int 0))
+         (label_ref (match_operand 1 "" ""))
+         (pc)))
+   (set (match_dup 0)
+       (plus:SI (match_dup 0)
+                (const_int -1)))]
+  "find_reg_note (insn, REG_NONNEG, 0)"
+  "...")
+@end group
+@end smallexample
+
+Note that since the insn is both a jump insn and has an output, it must
+deal with its own reloads, hence the `m' constraints.  Also note that
+since this insn is generated by the instruction combination phase
+combining two sequential insns together into an implicit parallel insn,
+the iteration counter needs to be biased by the same amount as the
+decrement operation, in this case -1.  Note that the following similar
+pattern will not be matched by the combiner.
+
+@smallexample
+@group
+(define_insn "decrement_and_branch_until_zero"
+  [(set (pc)
+       (if_then_else
+         (ge (match_operand:SI 0 "general_operand" "+d*am")
+             (const_int 1))
+         (label_ref (match_operand 1 "" ""))
+         (pc)))
+   (set (match_dup 0)
+       (plus:SI (match_dup 0)
+                (const_int -1)))]
+  "find_reg_note (insn, REG_NONNEG, 0)"
+  "...")
+@end group
+@end smallexample
+
+The other two special looping patterns, @samp{doloop_begin} and
+@samp{doloop_end}, are emitted by the loop optimiser for certain
+well-behaved loops with a finite number of loop iterations using
+information collected during strength reduction.  
+
+The @samp{doloop_end} pattern describes the actual looping instruction
+(or the implicit looping operation) and the @samp{doloop_begin} pattern
+is an optional companion pattern that can be used for initialisation
+needed for some low-overhead looping instructions.
+
+Note that some machines require the actual looping instruction to be
+emitted at the top of the loop (e.g., the TMS320C3x/C4x DSPs).  Emitting
+the true RTL for a looping instruction at the top of the loop can cause
+problems with flow analysis.  So instead, a dummy @code{doloop} insn is
+emitted at the end of the loop.  The machine dependent reorg pass checks
+for the presence of this @code{doloop} insn and then searches back to
+the top of the loop, where it inserts the true looping insn (provided
+there are no instructions in the loop which would cause problems).  Any
+additional labels can be emitted at this point.  In addition, if the
+desired special iteration counter register was not allocated, this
+machine dependent reorg pass could emit a traditional compare and jump
+instruction pair.
+
+The essential difference between the
+@samp{decrement_and_branch_until_zero} and the @samp{doloop_end}
+patterns is that the loop optimizer allocates an additional pseudo
+register for the latter as an iteration counter.  This pseudo register
+cannot be used within the loop (i.e., general induction variables cannot
+be derived from it), however, in many cases the loop induction variable
+may become redundant and removed by the flow pass.
+
+
  @node Insn Canonicalizations
  @section Canonicalization of Instructions
  @cindex canonicalization of instructions