1 Analysis of cycle costs for SH4:
7 -> pos_result linear: 5
12 udiv25 -> div_ge64k_end: 15
13 div_ge64k_end -> rts: 13
14 div_le128 -> div_le128_2: 2, r1 latency 3
15 udiv_le128 -> div_le128_2: 2, r1 latency 3
16 (u)div_le128 -> div_by_1: 9
17 (u)div_le128 -> rts: 17
18 div_by_1(_neg) -> rts: 4
19 div_ge64k -> div_r8: 2
20 div_ge64k -> div_ge64k_2: 3
21 udiv_ge64k -> udiv_r8: 3
22 udiv_ge64k -> div_ge64k_2: 3 + LS
23 (u)div_ge64k -> div_ge64k_end: 13
25 udiv_r8 -> div_r8_2: 2 + LS
33 -> <64k div_ge64k_neg_end: 28
34 -> >=64k div_ge64k_neg_end: 22
35 div_ge64k_neg_end ft -> rts: 14
36 div_r8_neg_end -> rts: 4
37 div_r8_neg -> div_r8_neg_end: 18
38 div_le128_neg -> div_by_1_neg: 4
39 div_le128_neg -> rts 18
41 sh4-200 absolute divisor range:
42 1 [2..128] [129..64K) [64K..|dividend|/256] >=64K,>|dividend/256|
44 sdiv pos: 20 24 41 35 32
45 sdiv neg: 15 25 42 36 33
47 sh4-300 absolute divisor range:
48 8 bit 16 bit 24 bit > 24 bit
55 unsigned: 42 + 3 + 3 (lingering ftrc latency + sts fpul,rx) at caller's site
56 signed: 33 + 3 + 3 (lingering ftrc latency + sts fpul,rx) at caller's site
58 call-div1: divisor range:
63 SFUNC_STATIC call overhead:
67 SFUNC_GOT call overhead - current:
74 ; 3 cycles worse than SFUNC_STATIC
76 SFUNC_GOT call overhead - improved assembler:
81 ; 2 cycles worse than SFUNC_STATIC