1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
2 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
4 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
6 <meta name="generator" content=
7 "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" />
9 <title>Hash-Based Containers</title>
10 <meta http-equiv="Content-Type" content=
11 "text/html; charset=us-ascii" />
16 <h1>Hash Table Design</h1>
18 <h2><a name="overview" id="overview">Overview</a></h2>
20 <p>The collision-chaining hash-based container has the
21 following declaration.</p>
25 <b>typename</b> Mapped,
26 <b>typename</b> Hash_Fn = std::hash<Key>,
27 <b>typename</b> Eq_Fn = std::equal_to<Key>,
28 <b>typename</b> Comb_Hash_Fn = <a href=
29 "direct_mask_range_hashing.html">direct_mask_range_hashing</a><>
30 <b>typename</b> Resize_Policy = <i>default explained below.</i>
31 <b>bool</b> Store_Hash = <b>false</b>,
32 <b>typename</b> Allocator = std::allocator<<b>char</b>> >
34 "cc_hash_table.html">cc_hash_table</a>;
37 <p>The parameters have the following meaning:</p>
40 <li><tt>Key</tt> is the key type.</li>
42 <li><tt>Mapped</tt> is the mapped-policy, and is explained in
43 <a href="tutorial.html#assoc_ms">Tutorial::Associative
44 Containers::Associative Containers Others than Maps</a>.</li>
46 <li><tt>Hash_Fn</tt> is a key hashing functor.</li>
48 <li><tt>Eq_Fn</tt> is a key equivalence functor.</li>
50 <li><tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>;
51 it describes how to translate hash values into positions
52 within the table. This is described in <a href=
53 "#hash_policies">Hash Policies</a>.</li>
55 <li><tt>Resize_Policy</tt> describes how a container object
56 should change its internal size. This is described in
57 <a href="#resize_policies">Resize Policies</a>.</li>
59 <li><tt>Store_Hash</tt> indicates whether the hash value
60 should be stored with each entry. This is described in
61 <a href="#policy_interaction">Policy Interaction</a>.</li>
63 <li><tt>Allocator</tt> is an allocator
67 <p>The probing hash-based container has the following
72 <b>typename</b> Mapped,
73 <b>typename</b> Hash_Fn = std::hash<Key>,
74 <b>typename</b> Eq_Fn = std::equal_to<Key>,
75 <b>typename</b> Comb_Probe_Fn = <a href=
76 "direct_mask_range_hashing.html">direct_mask_range_hashing</a><>
77 <b>typename</b> Probe_Fn = <i>default explained below.</i>
78 <b>typename</b> Resize_Policy = <i>default explained below.</i>
79 <b>bool</b> Store_Hash = <b>false</b>,
80 <b>typename</b> Allocator = std::allocator<<b>char</b>> >
82 "gp_hash_table.html">gp_hash_table</a>;
85 <p>The parameters are identical to those of the
86 collision-chaining container, except for the following.</p>
89 <li><tt>Comb_Probe_Fn</tt> describes how to transform a probe
90 sequence into a sequence of positions within the table.</li>
92 <li><tt>Probe_Fn</tt> describes a probe sequence policy.</li>
95 <p>Some of the default template values depend on the values of
96 other parameters, and are explained in <a href=
97 "#policy_interaction">Policy Interaction</a>.</p>
99 <h2><a name="hash_policies" id="hash_policies">Hash
102 <h3><a name="general_terms" id="general_terms">General
105 <p>Following is an explanation of some functions which hashing
106 involves. Figure <a href=
107 "#hash_ranged_hash_range_hashing_fns">Hash functions,
108 ranged-hash functions, and range-hashing functions</a>)
109 illustrates the discussion.</p>
111 <h6 class="c1"><a name="hash_ranged_hash_range_hashing_fns" id=
112 "hash_ranged_hash_range_hashing_fns"><img src=
113 "hash_ranged_hash_range_hashing_fns.png" alt=
114 "no image" /></a></h6>
116 <h6 class="c1">Hash functions, ranged-hash functions, and
117 range-hashing functions.</h6>
119 <p>Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the
120 strings of 3 characters). A hash-table algorithm needs to map
121 elements of <i>U</i> "uniformly" into the range <i>[0,..., m -
122 1]</i> (where <i>m</i> is a non-negative integral value, and
123 is, in general, time varying). <i>I.e.</i>, the algorithm needs
124 a <i>ranged-hash</i> function</p>
126 <p><i>f : U × Z<sub>+</sub> → Z<sub>+</sub></i>
129 <p>such that for any <i>u</i> in <i>U</i> ,</p>
131 <p><i>0 ≤ f(u, m) ≤ m - 1</i> ,</p>
133 <p>and which has "good uniformity" properties [<a href=
134 "references.html#knuth98sorting">knuth98sorting</a>]. One
135 common solution is to use the composition of the hash
138 <p><i>h : U → Z<sub>+</sub></i> ,</p>
140 <p>which maps elements of <i>U</i> into the non-negative
143 <p class="c2">g : Z<sub>+</sub> × Z<sub>+</sub> →
146 <p>which maps a non-negative hash value, and a non-negative
147 range upper-bound into a non-negative integral in the range
148 between 0 (inclusive) and the range upper bound (exclusive),
149 <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,</p>
151 <p><i>0 ≤ g(r, m) ≤ m - 1</i> .</p>
153 <p>The resulting ranged-hash function, is</p>
155 <p><i><a name="ranged_hash_composed_of_hash_and_range_hashing"
156 id="ranged_hash_composed_of_hash_and_range_hashing">f(u , m) =
157 g(h(u), m)</a></i> (1) .</p>
159 <p>From the above, it is obvious that given <i>g</i> and
160 <i>h</i>, <i>f</i> can always be composed (however the converse
161 is not true). The STL's hash-based containers allow specifying
162 a hash function, and use a hard-wired range-hashing function;
163 the ranged-hash function is implicitly composed.</p>
165 <p>The above describes the case where a key is to be mapped
166 into a <i>single position</i> within a hash table, <i>e.g.</i>,
167 in a collision-chaining table. In other cases, a key is to be
168 mapped into a <i>sequence of positions</i> within a table,
169 <i>e.g.</i>, in a probing table. Similar terms apply in this
170 case: the table requires a <i>ranged probe</i> function,
171 mapping a key into a sequence of positions withing the table.
172 This is typically achieved by composing a <i>hash function</i>
173 mapping the key into a non-negative integral type, a
174 <i>probe</i> function transforming the hash value into a
175 sequence of hash values, and a <i>range-hashing</i> function
176 transforming the sequence of hash values into a sequence of
179 <h3><a name="range_hashing_fns" id=
180 "range_hashing_fns">Range-Hashing Functions</a></h3>
182 <p>Some common choices for range-hashing functions are the
183 division, multiplication, and middle-square methods [<a href=
184 "references.html#knuth98sorting">knuth98sorting</a>], defined
187 <p><i><a name="division_method" id="division_method">g(r, m) =
188 r mod m</a></i> (2) ,</p>
190 <p><i>g(r, m) = ⌈ u/v ( a r mod v ) ⌉</i> ,</p>
194 <p><i>g(r, m) = ⌈ u/v ( r<sup>2</sup> mod v ) ⌉</i>
197 <p>respectively, for some positive integrals <i>u</i> and
198 <i>v</i> (typically powers of 2), and some <i>a</i>. Each of
199 these range-hashing functions works best for some different
202 <p>The division method <a href="#division_method">(2)</a> is a
203 very common choice. However, even this single method can be
204 implemented in two very different ways. It is possible to
205 implement <a href="#division_method">(2)</a> using the low
206 level <i>%</i> (modulo) operation (for any <i>m</i>), or the
207 low level <i>&</i> (bit-mask) operation (for the case where
208 <i>m</i> is a power of 2), <i>i.e.</i>,</p>
210 <p><i><a name="division_method_prime_mod" id=
211 "division_method_prime_mod">g(r, m) = r % m</a></i> (3) ,</p>
215 <p><i><a name="division_method_bit_mask" id=
216 "division_method_bit_mask">g(r, m) = r & m - 1, (m =
217 2<sup>k</sup>)</a></i> for some <i>k)</i> (4),</p>
221 <p>The <i>%</i> (modulo) implementation <a href=
222 "#division_method_prime_mod">(3)</a> has the advantage that for
223 <i>m</i> a prime far from a power of 2, <i>g(r, m)</i> is
224 affected by all the bits of <i>r</i> (minimizing the chance of
225 collision). It has the disadvantage of using the costly modulo
226 operation. This method is hard-wired into SGI's implementation
227 [<a href="references.html#sgi_stl">sgi_stl</a>].</p>
229 <p>The <i>&</i> (bit-mask) implementation <a href=
230 "#division_method_bit_mask">(4)</a> has the advantage of
231 relying on the fast bit-wise and operation. It has the
232 disadvantage that for <i>g(r, m)</i> is affected only by the
233 low order bits of <i>r</i>. This method is hard-wired into
234 Dinkumware's implementation [<a href=
235 "references.html#dinkumware_stl">dinkumware_stl</a>].</p>
237 <h3><a name="hash_policies_ranged_hash_policies" id=
238 "hash_policies_ranged_hash_policies">Ranged-Hash
241 <p>In cases it is beneficial to allow the
242 client to directly specify a ranged-hash hash function. It is
243 true, that the writer of the ranged-hash function cannot rely
244 on the values of <i>m</i> having specific numerical properties
245 suitable for hashing (in the sense used in [<a href=
246 "references.html#knuth98sorting">knuth98sorting</a>]), since
247 the values of <i>m</i> are determined by a resize policy with
248 possibly orthogonal considerations.</p>
250 <p>There are two cases where a ranged-hash function can be
251 superior. The firs is when using perfect hashing [<a href=
252 "references.html#knuth98sorting">knuth98sorting</a>]; the
253 second is when the values of <i>m</i> can be used to estimate
254 the "general" number of distinct values required. This is
255 described in the following.</p>
259 <p class="c2">s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>]</p>
261 <p>be a string of <i>t</i> characters, each of which is from
262 domain <i>S</i>. Consider the following ranged-hash
265 <p><a name="total_string_dna_hash" id=
266 "total_string_dna_hash"><i>f<sub>1</sub>(s, m) = ∑ <sub>i =
267 0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup></i> mod
268 <i>m</i></a> (5) ,</p>
270 <p>where <i>a</i> is some non-negative integral value. This is
271 the standard string-hashing function used in SGI's
272 implementation (with <i>a = 5</i>) [<a href=
273 "references.html#sgi_stl">sgi_stl</a>]. Its advantage is that
274 it takes into account all of the characters of the string.</p>
276 <p>Now assume that <i>s</i> is the string representation of a
277 of a long DNA sequence (and so <i>S = {'A', 'C', 'G',
278 'T'}</i>). In this case, scanning the entire string might be
279 prohibitively expensive. A possible alternative might be to use
280 only the first <i>k</i> characters of the string, where</p>
282 <p>|S|<sup>k</sup> ≥ m ,</p>
284 <p><i>i.e.</i>, using the hash function</p>
286 <p><a name="only_k_string_dna_hash" id=
287 "only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = ∑ <sub>i
288 = 0</sub><sup>k - 1</sup> s<sub>i</sub> a<sup>i</sup></i> mod
289 <i>m</i></a> , (6)</p>
291 <p>requiring scanning over only</p>
293 <p><i>k =</i> log<i><sub>4</sub>( m )</i></p>
297 <p>Other more elaborate hash-functions might scan <i>k</i>
298 characters starting at a random position (determined at each
299 resize), or scanning <i>k</i> random positions (determined at
300 each resize), <i>i.e.</i>, using</p>
302 <p><i>f<sub>3</sub>(s, m) = ∑ <sub>i =
303 r</sub>0</i><sup>r<sub>0</sub> + k - 1</sup> s<sub>i</sub>
304 a<sup>i</sup> mod <i>m</i> ,</p>
308 <p><i>f<sub>4</sub>(s, m) = ∑ <sub>i = 0</sub><sup>k -
309 1</sup> s<sub>r</sub>i</i> a<sup>r<sub>i</sub></sup> mod
312 <p>respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i>
313 each in the (inclusive) range <i>[0,...,t-1]</i>.</p>
315 <p>It should be noted that the above functions cannot be
316 decomposed as <a href=
317 "#ranged_hash_composed_of_hash_and_range_hashing">(1)</a> .</p>
319 <h3><a name="pb_ds_imp" id="pb_ds_imp">Implementation</a></h3>
321 <p>This sub-subsection describes the implementation of the
322 above in <tt>pb_ds</tt>. It first explains range-hashing
323 functions in collision-chaining tables, then ranged-hash
324 functions in collision-chaining tables, then probing-based
325 tables, and, finally, lists the relevant classes in
328 <h4>Range-Hashing and Ranged-Hashes in Collision-Chaining
332 "cc_hash_table.html"><tt>cc_hash_table</tt></a> is
333 parametrized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a
334 hash functor and a combining hash functor, respectively.</p>
336 <p>In general, <tt>Comb_Hash_Fn</tt> is considered a
337 range-hashing functor. <a href=
338 "cc_hash_table.html"><tt>cc_hash_table</tt></a>
339 synthesizes a ranged-hash function from <tt>Hash_Fn</tt> and
340 <tt>Comb_Hash_Fn</tt> (see <a href=
341 "#ranged_hash_composed_of_hash_and_range_hashing">(1)</a>
342 above). Figure <a href="#hash_range_hashing_seq_diagram">Insert
343 hash sequence diagram</a> shows an <tt>insert</tt> sequence
344 diagram for this case. The user inserts an element (point A),
345 the container transforms the key into a non-negative integral
346 using the hash functor (points B and C), and transforms the
347 result into a position using the combining functor (points D
350 <h6 class="c1"><a name="hash_range_hashing_seq_diagram" id=
351 "hash_range_hashing_seq_diagram"><img src=
352 "hash_range_hashing_seq_diagram.png" alt="no image" /></a></h6>
354 <h6 class="c1">Insert hash sequence diagram.</h6>
357 "cc_hash_table.html"><tt>cc_hash_table</tt></a>'s
358 hash-functor, <tt>Hash_Fn</tt> is instantiated by <a href=
359 "null_hash_fn.html"><tt>null_hash_fn</tt></a> (see <a href=
360 "concepts.html#concepts_null_policies">Interface::Concepts::Null
361 Policy Classes</a>), then <tt>Comb_Hash_Fn</tt> is taken to be
362 a ranged-hash function. Figure <a href=
363 "#hash_range_hashing_seq_diagram2">Insert hash sequence diagram
364 with a null hash policy</a> shows an <tt>insert</tt> sequence
365 diagram. The user inserts an element (point A), the container
366 transforms the key into a position using the combining functor
367 (points B and C).</p>
369 <h6 class="c1"><a name="hash_range_hashing_seq_diagram2" id=
370 "hash_range_hashing_seq_diagram2"><img src=
371 "hash_range_hashing_seq_diagram2.png" alt=
372 "no image" /></a></h6>
374 <h6 class="c1">Insert hash sequence diagram with a null hash
377 <h4>Probing Tables</h4>
380 "gp_hash_table.html"></a><tt>gp_hash_table</tt> is
381 parametrized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and
382 <tt>Comb_Probe_Fn</tt>. As before, if <tt>Hash_Fn</tt> and
383 <tt>Probe_Fn</tt> are, respectively, <a href=
384 "null_hash_fn.html"><tt>null_hash_fn</tt></a> and <a href=
385 "null_probe_fn.html"><tt>null_probe_fn</tt></a>, then
386 <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise,
387 <tt>Hash_Fn</tt> is a hash functor, <tt>Probe_Fn</tt> is a
388 functor for offsets from a hash value, and
389 <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a
390 sequence of positions within the table.</p>
392 <h4>Pre-Defined Policies</h4>
394 <p><tt>pb_ds</tt> contains some pre-defined classes
395 implementing range-hashing and probing functions:</p>
399 "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
401 "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
402 are range-hashing functions based on a bit-mask and a modulo
403 operation, respectively.</li>
406 "linear_probe_fn.html"><tt>linear_probe_fn</tt></a>, and
408 "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> are
409 a linear probe and a quadratic probe function,
411 </ol>Figure <a href="#hash_policy_cd">Hash policy class
412 diagram</a> shows a class diagram.
414 <h6 class="c1"><a name="hash_policy_cd" id=
415 "hash_policy_cd"><img src="hash_policy_cd.png" alt=
416 "no image" /></a></h6>
418 <h6 class="c1">Hash policy class diagram.</h6>
420 <h2><a name="resize_policies" id="resize_policies">Resize
423 <h3><a name="general" id="general">General Terms</a></h3>
425 <p>Hash-tables, as opposed to trees, do not naturally grow or
426 shrink. It is necessary to specify policies to determine how
427 and when a hash table should change its size. Usually, resize
428 policies can be decomposed into orthogonal policies:</p>
431 <li>A <i>size policy</i> indicating <i>how</i> a hash table
432 should grow (<i>e.g.,</i> it should multiply by powers of
435 <li>A <i>trigger policy</i> indicating <i>when</i> a hash
436 table should grow (<i>e.g.,</i> a load factor is
440 <h3><a name="size_policies" id="size_policies">Size
443 <p>Size policies determine how a hash table changes size. These
444 policies are simple, and there are relatively few sensible
445 options. An exponential-size policy (with the initial size and
446 growth factors both powers of 2) works well with a mask-based
447 range-hashing function (see <a href=
448 "#hash_policies">Range-Hashing Policies</a>), and is the
449 hard-wired policy used by Dinkumware [<a href=
450 "references.html#dinkumware_stl">dinkumware_stl</a>]. A
451 prime-list based policy works well with a modulo-prime range
452 hashing function (see <a href="#hash_policies">Range-Hashing
453 Policies</a>), and is the hard-wired policy used by SGI's
454 implementation [<a href=
455 "references.html#sgi_stl">sgi_stl</a>].</p>
457 <h3><a name="trigger_policies" id="trigger_policies">Trigger
460 <p>Trigger policies determine when a hash table changes size.
461 Following is a description of two policies: <i>load-check</i>
462 policies, and collision-check policies.</p>
464 <p>Load-check policies are straightforward. The user specifies
465 two factors, <i>α<sub>min</sub></i> and
466 <i>α<sub>max</sub></i>, and the hash table maintains the
469 <p><i><a name="load_factor_min_max" id=
470 "load_factor_min_max">α<sub>min</sub> ≤ (number of
471 stored elements) / (hash-table size) ≤
472 α<sub>max</sub></a></i> (1) .</p>
474 <p>Collision-check policies work in the opposite direction of
475 load-check policies. They focus on keeping the number of
476 collisions moderate and hoping that the size of the table will
477 not grow very large, instead of keeping a moderate load-factor
478 and hoping that the number of collisions will be small. A
479 maximal collision-check policy resizes when the longest
480 probe-sequence grows too large.</p>
482 <p>Consider Figure <a href="#balls_and_bins">Balls and
483 bins</a>. Let the size of the hash table be denoted by
484 <i>m</i>, the length of a probe sequence be denoted by
485 <i>k</i>, and some load factor be denoted by α. We would
486 like to calculate the minimal length of <i>k</i>, such that if
487 there were <i>α m</i> elements in the hash table, a probe
488 sequence of length <i>k</i> would be found with probability at
491 <h6 class="c1"><a name="balls_and_bins" id=
492 "balls_and_bins"><img src="balls_and_bins.png" alt=
493 "no image" /></a></h6>
495 <h6 class="c1">Balls and bins.</h6>
497 <p>Denote the probability that a probe sequence of length
498 <i>k</i> appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the
499 length of the probe sequence of bin <i>i</i> by
500 <i>l<sub>i</sub></i>, and assume uniform distribution. Then</p>
502 <p><a name="prob_of_p1" id=
503 "prob_of_p1"><i>p<sub>1</sub></i></a> = (3)</p>
505 <p class="c2"><b>P</b>(l<sub>1</sub> ≥ k) =</p>
507 <p><i><b>P</b>(l<sub>1</sub> ≥ α ( 1 + k / α - 1
510 <p><i>e ^ ( - ( α ( k / α - 1 )<sup>2</sup> ) /2
513 <p>where (a) follows from the Chernoff bound [<a href=
514 "references.html#motwani95random">motwani95random</a>]. To
515 calculate the probability that <i>some</i> bin contains a probe
516 sequence greater than <i>k</i>, we note that the
517 <i>l<sub>i</sub></i> are negatively-dependent [<a href=
518 "references.html#dubhashi98neg">dubhashi98neg</a>]. Let
519 <i><b>I</b>(.)</i> denote the indicator function. Then</p>
521 <p><a name="at_least_k_i_n_some_bin" id=
522 "at_least_k_i_n_some_bin"><i><b>P</b>( exists<sub>i</sub>
523 l<sub>i</sub> ≥ k ) =</i> (3)</a></p>
525 <p class="c2"><b>P</b> ( ∑ <sub>i = 1</sub><sup>m</sup>
526 <b>I</b>(l<sub>i</sub> ≥ k) ≥ 1 ) =</p>
528 <p><i><b>P</b> ( ∑ <sub>i = 1</sub><sup>m</sup> <b>I</b> (
529 l<sub>i</sub> ≥ k ) ≥ m p<sub>1</sub> ( 1 + 1 / (m
530 p<sub>1</sub>) - 1 ) ) ≤</i> (a)</p>
532 <p class="c2">e ^ ( ( - m p<sub>1</sub> ( 1 / (m p<sub>1</sub>)
533 - 1 ) <sup>2</sup> ) / 2 ) ,</p>
535 <p>where (a) follows from the fact that the Chernoff bound can
536 be applied to negatively-dependent variables [<a href=
537 "references.html#dubhashi98neg">dubhashi98neg</a>]. Inserting
538 <a href="#prob_of_p1">(2)</a> into <a href=
539 "#at_least_k_i_n_some_bin">(3)</a>, and equating with
540 <i>1/m</i>, we obtain</p>
542 <p><i>k ~ √ ( 2 α</i> ln <i>2 m</i> ln<i>(m) )
545 <h3><a name="imp_pb_ds" id="imp_pb_ds">Implementation</a></h3>
547 <p>This sub-subsection describes the implementation of the
548 above in <tt>pb_ds</tt>. It first describes resize policies and
549 their decomposition into trigger and size policies, then
550 describes pre-defined classes, and finally discusses controlled
551 access the policies' internals.</p>
553 <h4>Resize Policies and Their Decomposition</h4>
555 <p>Each hash-based container is parametrized by a
556 <tt>Resize_Policy</tt> parameter; the container derives
557 <tt><b>public</b></tt>ly from <tt>Resize_Policy</tt>. For
560 <a href="cc_hash_table.html">cc_hash_table</a><
562 <b>typename</b> Mapped,
564 <b>typename</b> Resize_Policy
566 <b>public</b> Resize_Policy
569 <p>As a container object is modified, it continuously notifies
570 its <tt>Resize_Policy</tt> base of internal changes
571 (<i>e.g.</i>, collisions encountered and elements being
572 inserted). It queries its <tt>Resize_Policy</tt> base whether
573 it needs to be resized, and if so, to what size.</p>
575 <p>Figure <a href="#insert_resize_sequence_diagram1">Insert
576 resize sequence diagram</a> shows a (possible) sequence diagram
577 of an insert operation. The user inserts an element; the hash
578 table notifies its resize policy that a search has started
579 (point A); in this case, a single collision is encountered -
580 the table notifies its resize policy of this (point B); the
581 container finally notifies its resize policy that the search
582 has ended (point C); it then queries its resize policy whether
583 a resize is needed, and if so, what is the new size (points D
584 to G); following the resize, it notifies the policy that a
585 resize has completed (point H); finally, the element is
586 inserted, and the policy notified (point I).</p>
588 <h6 class="c1"><a name="insert_resize_sequence_diagram1" id=
589 "insert_resize_sequence_diagram1"><img src=
590 "insert_resize_sequence_diagram1.png" alt=
591 "no image" /></a></h6>
593 <h6 class="c1">Insert resize sequence diagram.</h6>
595 <p>In practice, a resize policy can be usually orthogonally
596 decomposed to a size policy and a trigger policy. Consequently,
597 the library contains a single class for instantiating a resize
599 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
600 is parametrized by <tt>Size_Policy</tt> and
601 <tt>Trigger_Policy</tt>, derives <tt><b>public</b></tt>ly from
602 both, and acts as a standard delegate [<a href=
603 "references.html#gamma95designpatterns">gamma95designpatterns</a>]
604 to these policies.</p>
606 <p>Figures <a href="#insert_resize_sequence_diagram2">Standard
607 resize policy trigger sequence diagram</a> and <a href=
608 "#insert_resize_sequence_diagram3">Standard resize policy size
609 sequence diagram</a> show sequence diagrams illustrating the
610 interaction between the standard resize policy and its trigger
611 and size policies, respectively.</p>
613 <h6 class="c1"><a name="insert_resize_sequence_diagram2" id=
614 "insert_resize_sequence_diagram2"><img src=
615 "insert_resize_sequence_diagram2.png" alt=
616 "no image" /></a></h6>
618 <h6 class="c1">Standard resize policy trigger sequence
621 <h6 class="c1"><a name="insert_resize_sequence_diagram3" id=
622 "insert_resize_sequence_diagram3"><img src=
623 "insert_resize_sequence_diagram3.png" alt=
624 "no image" /></a></h6>
626 <h6 class="c1">Standard resize policy size sequence
629 <h4>Pre-Defined Policies</h4>
631 <p>The library includes the following
632 instantiations of size and trigger policies:</p>
636 "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>
637 implements a load check trigger policy.</li>
640 "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
641 implements a collision check trigger policy.</li>
644 "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>
645 implements an exponential-size policy (which should be used
646 with mask range hashing).</li>
649 "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>
650 implementing a size policy based on a sequence of primes
651 [<a href="references.html#sgi_stl">sgi_stl</a>] (which should
652 be used with mod range hashing</li>
655 <p>Figure <a href="#resize_policy_cd">Resize policy class
656 diagram</a> gives an overall picture of the resize-related
658 "basic_hash_table.html"><tt>basic_hash_table</tt></a>
659 is parametrized by <tt>Resize_Policy</tt>, which it subclasses
660 publicly. This class is currently instantiated only by <a href=
661 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.
663 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
664 itself is parametrized by <tt>Trigger_Policy</tt> and
665 <tt>Size_Policy</tt>. Currently, <tt>Trigger_Policy</tt> is
666 instantiated by <a href=
667 "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>,
669 "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>;
670 <tt>Size_Policy</tt> is instantiated by <a href=
671 "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>,
673 "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>.</p>
675 <h6 class="c1"><a name="resize_policy_cd" id=
676 "resize_policy_cd"><img src="resize_policy_cd.png" alt=
677 "no image" /></a></h6>
679 <h6 class="c1">Resize policy class diagram.</h6>
681 <h4>Controlled Access to Policies' Internals</h4>
683 <p>There are cases where (controlled) access to resize
684 policies' internals is beneficial. <i>E.g.</i>, it is sometimes
685 useful to query a hash-table for the table's actual size (as
686 opposed to its <tt>size()</tt> - the number of values it
687 currently holds); it is sometimes useful to set a table's
688 initial size, externally resize it, or change load factors.</p>
690 <p>Clearly, supporting such methods both decreases the
691 encapsulation of hash-based containers, and increases the
692 diversity between different associative-containers' interfaces.
693 Conversely, omitting such methods can decrease containers'
696 <p>In order to avoid, to the extent possible, the above
697 conflict, the hash-based containers themselves do not address
698 any of these questions; this is deferred to the resize policies,
699 which are easier to change or replace. Thus, for example,
701 "cc_hash_table.html"><tt>cc_hash_table</tt></a> nor
703 "gp_hash_table.html"><tt>gp_hash_table</tt></a>
704 contain methods for querying the actual size of the table; this
705 is deferred to <a href=
706 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.</p>
708 <p>Furthermore, the policies themselves are parametrized by
709 template arguments that determine the methods they support
711 "references.html#alexandrescu01modern">alexandrescu01modern</a>]
712 shows techniques for doing so). <a href=
713 "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
714 is parametrized by <tt>External_Size_Access</tt> that
715 determines whether it supports methods for querying the actual
716 size of the table or resizing it. <a href=
717 "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>
718 is parametrized by <tt>External_Load_Access</tt> that
719 determines whether it supports methods for querying or
720 modifying the loads. <a href=
721 "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
722 is parametrized by <tt>External_Load_Access</tt> that
723 determines whether it supports methods for querying the
726 <p>Some operations, for example, resizing a container at
727 run time, or changing the load factors of a load-check trigger
728 policy, require the container itself to resize. As mentioned
729 above, the hash-based containers themselves do not contain
730 these types of methods, only their resize policies.
731 Consequently, there must be some mechanism for a resize policy
732 to manipulate the hash-based container. As the hash-based
733 container is a subclass of the resize policy, this is done
734 through virtual methods. Each hash-based container has a
735 <tt><b>private</b></tt> <tt><b>virtual</b></tt> method:</p>
739 (size_type new_size);
742 <p>which resizes the container. Implementations of
743 <tt>Resize_Policy</tt> can export public methods for resizing
744 the container externally; these methods internally call
745 <tt>do_resize</tt> to resize the table.</p>
747 <h2><a name="policy_interaction" id="policy_interaction">Policy
750 <p>Hash-tables are unfortunately especially susceptible to
751 choice of policies. One of the more complicated aspects of this
752 is that poor combinations of good policies can form a poor
753 container. Following are some considerations.</p>
755 <h3><a name="policy_interaction_probe_size_trigger" id=
756 "policy_interaction_probe_size_trigger">Probe Policies, Size
757 Policies, and Trigger Policies</a></h3>
759 <p>Some combinations do not work well for probing containers.
760 For example, combining a quadratic probe policy with an
761 exponential size policy can yield a poor container: when an
762 element is inserted, a trigger policy might decide that there
763 is no need to resize, as the table still contains unused
764 entries; the probe sequence, however, might never reach any of
765 the unused entries.</p>
767 <p>Unfortunately, <tt>pb_ds</tt> cannot detect such problems at
768 compilation (they are halting reducible). It therefore defines
769 an exception class <a href=
770 "insert_error.html"><tt>insert_error</tt></a> to throw an
771 exception in this case.</p>
773 <h3><a name="policy_interaction_hash_trigger" id=
774 "policy_interaction_hash_trigger">Hash Policies and Trigger
777 <p>Some trigger policies are especially susceptible to poor
778 hash functions. Suppose, as an extreme case, that the hash
779 function transforms each key to the same hash value. After some
780 inserts, a collision detecting policy will always indicate that
781 the container needs to grow.</p>
783 <p>The library, therefore, by design, limits each operation to
784 one resize. For each <tt>insert</tt>, for example, it queries
785 only once whether a resize is needed.</p>
787 <h3><a name="policy_interaction_eq_sth_hash" id=
788 "policy_interaction_eq_sth_hash">Equivalence Functors, Storing
789 Hash Values, and Hash Functions</a></h3>
792 "cc_hash_table.html"><tt>cc_hash_table</tt></a> and
794 "gp_hash_table.html"><tt>gp_hash_table</tt></a> are
795 parametrized by an equivalence functor and by a
796 <tt>Store_Hash</tt> parameter. If the latter parameter is
797 <tt><b>true</b></tt>, then the container stores with each entry
798 a hash value, and uses this value in case of collisions to
799 determine whether to apply a hash value. This can lower the
800 cost of collision for some types, but increase the cost of
801 collisions for other types.</p>
803 <p>If a ranged-hash function or ranged probe function is
804 directly supplied, however, then it makes no sense to store the
805 hash value with each entry. <tt>pb_ds</tt>'s container will
806 fail at compilation, by design, if this is attempted.</p>
808 <h3><a name="policy_interaction_size_load_check" id=
809 "policy_interaction_size_load_check">Size Policies and
810 Load-Check Trigger Policies</a></h3>
812 <p>Assume a size policy issues an increasing sequence of sizes
813 <i>a, a q, a q<sup>1</sup>, a q<sup>2</sup>, ...</i> For
814 example, an exponential size policy might issue the sequence of
815 sizes <i>8, 16, 32, 64, ...</i></p>
817 <p>If a load-check trigger policy is used, with loads
818 <i>α<sub>min</sub></i> and <i>α<sub>max</sub></i>,
819 respectively, then it is a good idea to have:</p>
822 <li><i>α<sub>max</sub> ~ 1 / q</i></li>
824 <li><i>α<sub>min</sub> < 1 / (2 q)</i></li>
827 <p>This will ensure that the amortized hash cost of each
828 modifying operation is at most approximately 3.</p>
830 <p><i>α<sub>min</sub> ~ α<sub>max</sub></i> is, in
831 any case, a bad choice, and <i>α<sub>min</sub> >
832 α<sub>max</sub></i> is horrendous.</p>