libstdc++-v3/doc/html/ext/pb_ds/hash_based_containers.html

   1 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   2     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   3
   4 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   5 <head>
   6   <meta name="generator" content=
   7   "HTML Tidy for Linux/x86 (vers 12 April 2005), see www.w3.org" />
   8
   9   <title>Hash-Based Containers</title>
  10   <meta http-equiv="Content-Type" content=
  11   "text/html; charset=us-ascii" />
  12   </head>
  13
  14 <body>
  15   <div id="page">
  16     <h1>Hash Table Design</h1>
  17
  18     <h2><a name="overview" id="overview">Overview</a></h2>
  19
  20     <p>The collision-chaining hash-based container has the
  21     following declaration.</p>
  22     <pre>
  23 <b>template</b>&lt;
  24     <b>typename</b> Key,
  25     <b>typename</b> Mapped,
  26     <b>typename</b> Hash_Fn = std::hash&lt;Key&gt;,
  27     <b>typename</b> Eq_Fn = std::equal_to&lt;Key&gt;,
  28     <b>typename</b> Comb_Hash_Fn = <a href=
  29 "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
  30     <b>typename</b> Resize_Policy = <i>default explained below.</i>
  31      <b>bool</b> Store_Hash = <b>false</b>,
  32      <b>typename</b> Allocator = std::allocator&lt;<b>char</b>&gt; &gt;
  33 <b>class</b> <a href=
  34 "cc_hash_table.html">cc_hash_table</a>;
  35 </pre>
  36
  37     <p>The parameters have the following meaning:</p>
  38
  39     <ol>
  40       <li><tt>Key</tt> is the key type.</li>
  41
  42       <li><tt>Mapped</tt> is the mapped-policy, and is explained in
  43       <a href="tutorial.html#assoc_ms">Tutorial::Associative
  44       Containers::Associative Containers Others than Maps</a>.</li>
  45
  46       <li><tt>Hash_Fn</tt> is a key hashing functor.</li>
  47
  48       <li><tt>Eq_Fn</tt> is a key equivalence functor.</li>
  49
  50       <li><tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>;
  51       it describes how to translate hash values into positions
  52       within the table. This is described in <a href=
  53       "#hash_policies">Hash Policies</a>.</li>
  54
  55       <li><tt>Resize_Policy</tt> describes how a container object
  56       should change its internal size. This is described in
  57       <a href="#resize_policies">Resize Policies</a>.</li>
  58
  59       <li><tt>Store_Hash</tt> indicates whether the hash value
  60       should be stored with each entry. This is described in
  61       <a href="#policy_interaction">Policy Interaction</a>.</li>
  62
  63       <li><tt>Allocator</tt> is an allocator
  64       type.</li>
  65     </ol>
  66
  67     <p>The probing hash-based container has the following
  68     declaration.</p>
  69     <pre>
  70 <b>template</b>&lt;
  71     <b>typename</b> Key,
  72     <b>typename</b> Mapped,
  73     <b>typename</b> Hash_Fn = std::hash&lt;Key&gt;,
  74     <b>typename</b> Eq_Fn = std::equal_to&lt;Key&gt;,
  75     <b>typename</b> Comb_Probe_Fn = <a href=
  76 "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
  77     <b>typename</b> Probe_Fn = <i>default explained below.</i>
  78     <b>typename</b> Resize_Policy = <i>default explained below.</i>
  79     <b>bool</b> Store_Hash = <b>false</b>,
  80     <b>typename</b> Allocator =  std::allocator&lt;<b>char</b>&gt; &gt;
  81 <b>class</b> <a href=
  82 "gp_hash_table.html">gp_hash_table</a>;
  83 </pre>
  84
  85     <p>The parameters are identical to those of the
  86     collision-chaining container, except for the following.</p>
  87
  88     <ol>
  89       <li><tt>Comb_Probe_Fn</tt> describes how to transform a probe
  90       sequence into a sequence of positions within the table.</li>
  91
  92       <li><tt>Probe_Fn</tt> describes a probe sequence policy.</li>
  93     </ol>
  94
  95     <p>Some of the default template values depend on the values of
  96     other parameters, and are explained in <a href=
  97     "#policy_interaction">Policy Interaction</a>.</p>
  98
  99     <h2><a name="hash_policies" id="hash_policies">Hash
 100     Policies</a></h2>
 101
 102     <h3><a name="general_terms" id="general_terms">General
 103     Terms</a></h3>
 104
 105     <p>Following is an explanation of some functions which hashing
 106     involves. Figure <a href=
 107     "#hash_ranged_hash_range_hashing_fns">Hash functions,
 108     ranged-hash functions, and range-hashing functions</a>)
 109     illustrates the discussion.</p>
 110
 111     <h6 class="c1"><a name="hash_ranged_hash_range_hashing_fns" id=
 112     "hash_ranged_hash_range_hashing_fns"><img src=
 113     "hash_ranged_hash_range_hashing_fns.png" alt=
 114     "no image" /></a></h6>
 115
 116     <h6 class="c1">Hash functions, ranged-hash functions, and
 117     range-hashing functions.</h6>
 118
 119     <p>Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the
 120     strings of 3 characters). A hash-table algorithm needs to map
 121     elements of <i>U</i> "uniformly" into the range <i>[0,..., m -
 122     1]</i> (where <i>m</i> is a non-negative integral value, and
 123     is, in general, time varying). <i>I.e.</i>, the algorithm needs
 124     a <i>ranged-hash</i> function</p>
 125
 126     <p><i>f : U &times; Z<sub>+</sub> &rarr; Z<sub>+</sub></i>
 127     ,</p>
 128
 129     <p>such that for any <i>u</i> in <i>U</i> ,</p>
 130
 131     <p><i>0 &le; f(u, m) &le; m - 1</i> ,</p>
 132
 133     <p>and which has "good uniformity" properties [<a href=
 134     "references.html#knuth98sorting">knuth98sorting</a>]. One
 135     common solution is to use the composition of the hash
 136     function</p>
 137
 138     <p><i>h : U &rarr; Z<sub>+</sub></i> ,</p>
 139
 140     <p>which maps elements of <i>U</i> into the non-negative
 141     integrals, and</p>
 142
 143     <p class="c2">g : Z<sub>+</sub> &times; Z<sub>+</sub> &rarr;
 144     Z<sub>+</sub>,</p>
 145
 146     <p>which maps a non-negative hash value, and a non-negative
 147     range upper-bound into a non-negative integral in the range
 148     between 0 (inclusive) and the range upper bound (exclusive),
 149     <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,</p>
 150
 151     <p><i>0 &le; g(r, m) &le; m - 1</i> .</p>
 152
 153     <p>The resulting ranged-hash function, is</p>
 154
 155     <p><i><a name="ranged_hash_composed_of_hash_and_range_hashing"
 156     id="ranged_hash_composed_of_hash_and_range_hashing">f(u , m) =
 157     g(h(u), m)</a></i> (1) .</p>
 158
 159     <p>From the above, it is obvious that given <i>g</i> and
 160     <i>h</i>, <i>f</i> can always be composed (however the converse
 161     is not true). The STL's hash-based containers allow specifying
 162     a hash function, and use a hard-wired range-hashing function;
 163     the ranged-hash function is implicitly composed.</p>
 164
 165     <p>The above describes the case where a key is to be mapped
 166     into a <i>single position</i> within a hash table, <i>e.g.</i>,
 167     in a collision-chaining table. In other cases, a key is to be
 168     mapped into a <i>sequence of positions</i> within a table,
 169     <i>e.g.</i>, in a probing table. Similar terms apply in this
 170     case: the table requires a <i>ranged probe</i> function,
 171     mapping a key into a sequence of positions withing the table.
 172     This is typically achieved by composing a <i>hash function</i>
 173     mapping the key into a non-negative integral type, a
 174     <i>probe</i> function transforming the hash value into a
 175     sequence of hash values, and a <i>range-hashing</i> function
 176     transforming the sequence of hash values into a sequence of
 177     positions.</p>
 178
 179     <h3><a name="range_hashing_fns" id=
 180     "range_hashing_fns">Range-Hashing Functions</a></h3>
 181
 182     <p>Some common choices for range-hashing functions are the
 183     division, multiplication, and middle-square methods [<a href=
 184     "references.html#knuth98sorting">knuth98sorting</a>], defined
 185     as</p>
 186
 187     <p><i><a name="division_method" id="division_method">g(r, m) =
 188     r mod m</a></i> (2) ,</p>
 189
 190     <p><i>g(r, m) = &lceil; u/v ( a r mod v ) &rceil;</i> ,</p>
 191
 192     <p>and</p>
 193
 194     <p><i>g(r, m) = &lceil; u/v ( r<sup>2</sup> mod v ) &rceil;</i>
 195     ,</p>
 196
 197     <p>respectively, for some positive integrals <i>u</i> and
 198     <i>v</i> (typically powers of 2), and some <i>a</i>. Each of
 199     these range-hashing functions works best for some different
 200     setting.</p>
 201
 202     <p>The division method <a href="#division_method">(2)</a> is a
 203     very common choice. However, even this single method can be
 204     implemented in two very different ways. It is possible to
 205     implement <a href="#division_method">(2)</a> using the low
 206     level <i>%</i> (modulo) operation (for any <i>m</i>), or the
 207     low level <i>&amp;</i> (bit-mask) operation (for the case where
 208     <i>m</i> is a power of 2), <i>i.e.</i>,</p>
 209
 210     <p><i><a name="division_method_prime_mod" id=
 211     "division_method_prime_mod">g(r, m) = r % m</a></i> (3) ,</p>
 212
 213     <p>and</p>
 214
 215     <p><i><a name="division_method_bit_mask" id=
 216     "division_method_bit_mask">g(r, m) = r &amp; m - 1, (m =
 217     2<sup>k</sup>)</a></i> for some <i>k)</i> (4),</p>
 218
 219     <p>respectively.</p>
 220
 221     <p>The <i>%</i> (modulo) implementation <a href=
 222     "#division_method_prime_mod">(3)</a> has the advantage that for
 223     <i>m</i> a prime far from a power of 2, <i>g(r, m)</i> is
 224     affected by all the bits of <i>r</i> (minimizing the chance of
 225     collision). It has the disadvantage of using the costly modulo
 226     operation. This method is hard-wired into SGI's implementation
 227     [<a href="references.html#sgi_stl">sgi_stl</a>].</p>
 228
 229     <p>The <i>&amp;</i> (bit-mask) implementation <a href=
 230     "#division_method_bit_mask">(4)</a> has the advantage of
 231     relying on the fast bit-wise and operation. It has the
 232     disadvantage that for <i>g(r, m)</i> is affected only by the
 233     low order bits of <i>r</i>. This method is hard-wired into
 234     Dinkumware's implementation [<a href=
 235     "references.html#dinkumware_stl">dinkumware_stl</a>].</p>
 236
 237     <h3><a name="hash_policies_ranged_hash_policies" id=
 238     "hash_policies_ranged_hash_policies">Ranged-Hash
 239     Functions</a></h3>
 240
 241     <p>In cases it is beneficial to allow the
 242     client to directly specify a ranged-hash hash function. It is
 243     true, that the writer of the ranged-hash function cannot rely
 244     on the values of <i>m</i> having specific numerical properties
 245     suitable for hashing (in the sense used in [<a href=
 246     "references.html#knuth98sorting">knuth98sorting</a>]), since
 247     the values of <i>m</i> are determined by a resize policy with
 248     possibly orthogonal considerations.</p>
 249
 250     <p>There are two cases where a ranged-hash function can be
 251     superior. The firs is when using perfect hashing [<a href=
 252     "references.html#knuth98sorting">knuth98sorting</a>]; the
 253     second is when the values of <i>m</i> can be used to estimate
 254     the "general" number of distinct values required. This is
 255     described in the following.</p>
 256
 257     <p>Let</p>
 258
 259     <p class="c2">s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>]</p>
 260
 261     <p>be a string of <i>t</i> characters, each of which is from
 262     domain <i>S</i>. Consider the following ranged-hash
 263     function:</p>
 264
 265     <p><a name="total_string_dna_hash" id=
 266     "total_string_dna_hash"><i>f<sub>1</sub>(s, m) = &sum; <sub>i =
 267     0</sub><sup>t - 1</sup> s<sub>i</sub> a<sup>i</sup></i> mod
 268     <i>m</i></a> (5) ,</p>
 269
 270     <p>where <i>a</i> is some non-negative integral value. This is
 271     the standard string-hashing function used in SGI's
 272     implementation (with <i>a = 5</i>) [<a href=
 273     "references.html#sgi_stl">sgi_stl</a>]. Its advantage is that
 274     it takes into account all of the characters of the string.</p>
 275
 276     <p>Now assume that <i>s</i> is the string representation of a
 277     of a long DNA sequence (and so <i>S = {'A', 'C', 'G',
 278     'T'}</i>). In this case, scanning the entire string might be
 279     prohibitively expensive. A possible alternative might be to use
 280     only the first <i>k</i> characters of the string, where</p>
 281
 282     <p>|S|<sup>k</sup> &ge; m ,</p>
 283
 284     <p><i>i.e.</i>, using the hash function</p>
 285
 286     <p><a name="only_k_string_dna_hash" id=
 287     "only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = &sum; <sub>i
 288     = 0</sub><sup>k - 1</sup> s<sub>i</sub> a<sup>i</sup></i> mod
 289     <i>m</i></a> , (6)</p>
 290
 291     <p>requiring scanning over only</p>
 292
 293     <p><i>k =</i> log<i><sub>4</sub>( m )</i></p>
 294
 295     <p>characters.</p>
 296
 297     <p>Other more elaborate hash-functions might scan <i>k</i>
 298     characters starting at a random position (determined at each
 299     resize), or scanning <i>k</i> random positions (determined at
 300     each resize), <i>i.e.</i>, using</p>
 301
 302     <p><i>f<sub>3</sub>(s, m) = &sum; <sub>i =
 303     r</sub>0</i><sup>r<sub>0</sub> + k - 1</sup> s<sub>i</sub>
 304     a<sup>i</sup> mod <i>m</i> ,</p>
 305
 306     <p>or</p>
 307
 308     <p><i>f<sub>4</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k -
 309     1</sup> s<sub>r</sub>i</i> a<sup>r<sub>i</sub></sup> mod
 310     <i>m</i> ,</p>
 311
 312     <p>respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i>
 313     each in the (inclusive) range <i>[0,...,t-1]</i>.</p>
 314
 315     <p>It should be noted that the above functions cannot be
 316     decomposed as <a href=
 317     "#ranged_hash_composed_of_hash_and_range_hashing">(1)</a> .</p>
 318
 319     <h3><a name="pb_ds_imp" id="pb_ds_imp">Implementation</a></h3>
 320
 321     <p>This sub-subsection describes the implementation of the
 322     above in <tt>pb_ds</tt>. It first explains range-hashing
 323     functions in collision-chaining tables, then ranged-hash
 324     functions in collision-chaining tables, then probing-based
 325     tables, and, finally, lists the relevant classes in
 326     <tt>pb_ds</tt>.</p>
 327
 328     <h4>Range-Hashing and Ranged-Hashes in Collision-Chaining
 329     Tables</h4>
 330
 331     <p><a href=
 332     "cc_hash_table.html"><tt>cc_hash_table</tt></a> is
 333     parametrized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a
 334     hash functor and a combining hash functor, respectively.</p>
 335
 336     <p>In general, <tt>Comb_Hash_Fn</tt> is considered a
 337     range-hashing functor. <a href=
 338     "cc_hash_table.html"><tt>cc_hash_table</tt></a>
 339     synthesizes a ranged-hash function from <tt>Hash_Fn</tt> and
 340     <tt>Comb_Hash_Fn</tt> (see <a href=
 341     "#ranged_hash_composed_of_hash_and_range_hashing">(1)</a>
 342     above). Figure <a href="#hash_range_hashing_seq_diagram">Insert
 343     hash sequence diagram</a> shows an <tt>insert</tt> sequence
 344     diagram for this case. The user inserts an element (point A),
 345     the container transforms the key into a non-negative integral
 346     using the hash functor (points B and C), and transforms the
 347     result into a position using the combining functor (points D
 348     and E).</p>
 349
 350     <h6 class="c1"><a name="hash_range_hashing_seq_diagram" id=
 351     "hash_range_hashing_seq_diagram"><img src=
 352     "hash_range_hashing_seq_diagram.png" alt="no image" /></a></h6>
 353
 354     <h6 class="c1">Insert hash sequence diagram.</h6>
 355
 356     <p>If <a href=
 357     "cc_hash_table.html"><tt>cc_hash_table</tt></a>'s
 358     hash-functor, <tt>Hash_Fn</tt> is instantiated by <a href=
 359     "null_hash_fn.html"><tt>null_hash_fn</tt></a> (see <a href=
 360     "concepts.html#concepts_null_policies">Interface::Concepts::Null
 361     Policy Classes</a>), then <tt>Comb_Hash_Fn</tt> is taken to be
 362     a ranged-hash function. Figure <a href=
 363     "#hash_range_hashing_seq_diagram2">Insert hash sequence diagram
 364     with a null hash policy</a> shows an <tt>insert</tt> sequence
 365     diagram. The user inserts an element (point A), the container
 366     transforms the key into a position using the combining functor
 367     (points B and C).</p>
 368
 369     <h6 class="c1"><a name="hash_range_hashing_seq_diagram2" id=
 370     "hash_range_hashing_seq_diagram2"><img src=
 371     "hash_range_hashing_seq_diagram2.png" alt=
 372     "no image" /></a></h6>
 373
 374     <h6 class="c1">Insert hash sequence diagram with a null hash
 375     policy.</h6>
 376
 377     <h4>Probing Tables</h4>
 378
 379     <p><a href=
 380     "gp_hash_table.html"></a><tt>gp_hash_table</tt> is
 381     parametrized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and
 382     <tt>Comb_Probe_Fn</tt>. As before, if <tt>Hash_Fn</tt> and
 383     <tt>Probe_Fn</tt> are, respectively, <a href=
 384     "null_hash_fn.html"><tt>null_hash_fn</tt></a> and <a href=
 385     "null_probe_fn.html"><tt>null_probe_fn</tt></a>, then
 386     <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise,
 387     <tt>Hash_Fn</tt> is a hash functor, <tt>Probe_Fn</tt> is a
 388     functor for offsets from a hash value, and
 389     <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a
 390     sequence of positions within the table.</p>
 391
 392     <h4>Pre-Defined Policies</h4>
 393
 394     <p><tt>pb_ds</tt> contains some pre-defined classes
 395     implementing range-hashing and probing functions:</p>
 396
 397     <ol>
 398       <li><a href=
 399       "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
 400       and <a href=
 401       "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
 402       are range-hashing functions based on a bit-mask and a modulo
 403       operation, respectively.</li>
 404
 405       <li><a href=
 406       "linear_probe_fn.html"><tt>linear_probe_fn</tt></a>, and
 407       <a href=
 408       "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> are
 409       a linear probe and a quadratic probe function,
 410       respectively.</li>
 411     </ol>Figure <a href="#hash_policy_cd">Hash policy class
 412     diagram</a> shows a class diagram.
 413
 414     <h6 class="c1"><a name="hash_policy_cd" id=
 415     "hash_policy_cd"><img src="hash_policy_cd.png" alt=
 416     "no image" /></a></h6>
 417
 418     <h6 class="c1">Hash policy class diagram.</h6>
 419
 420     <h2><a name="resize_policies" id="resize_policies">Resize
 421     Policies</a></h2>
 422
 423     <h3><a name="general" id="general">General Terms</a></h3>
 424
 425     <p>Hash-tables, as opposed to trees, do not naturally grow or
 426     shrink. It is necessary to specify policies to determine how
 427     and when a hash table should change its size. Usually, resize
 428     policies can be decomposed into orthogonal policies:</p>
 429
 430     <ol>
 431       <li>A <i>size policy</i> indicating <i>how</i> a hash table
 432       should grow (<i>e.g.,</i> it should multiply by powers of
 433       2).</li>
 434
 435       <li>A <i>trigger policy</i> indicating <i>when</i> a hash
 436       table should grow (<i>e.g.,</i> a load factor is
 437       exceeded).</li>
 438     </ol>
 439
 440     <h3><a name="size_policies" id="size_policies">Size
 441     Policies</a></h3>
 442
 443     <p>Size policies determine how a hash table changes size. These
 444     policies are simple, and there are relatively few sensible
 445     options. An exponential-size policy (with the initial size and
 446     growth factors both powers of 2) works well with a mask-based
 447     range-hashing function (see <a href=
 448     "#hash_policies">Range-Hashing Policies</a>), and is the
 449     hard-wired policy used by Dinkumware [<a href=
 450     "references.html#dinkumware_stl">dinkumware_stl</a>]. A
 451     prime-list based policy works well with a modulo-prime range
 452     hashing function (see <a href="#hash_policies">Range-Hashing
 453     Policies</a>), and is the hard-wired policy used by SGI's
 454     implementation [<a href=
 455     "references.html#sgi_stl">sgi_stl</a>].</p>
 456
 457     <h3><a name="trigger_policies" id="trigger_policies">Trigger
 458     Policies</a></h3>
 459
 460     <p>Trigger policies determine when a hash table changes size.
 461     Following is a description of two policies: <i>load-check</i>
 462     policies, and collision-check policies.</p>
 463
 464     <p>Load-check policies are straightforward. The user specifies
 465     two factors, <i>&alpha;<sub>min</sub></i> and
 466     <i>&alpha;<sub>max</sub></i>, and the hash table maintains the
 467     invariant that</p>
 468
 469     <p><i><a name="load_factor_min_max" id=
 470     "load_factor_min_max">&alpha;<sub>min</sub> &le; (number of
 471     stored elements) / (hash-table size) &le;
 472     &alpha;<sub>max</sub></a></i> (1) .</p>
 473
 474     <p>Collision-check policies work in the opposite direction of
 475     load-check policies. They focus on keeping the number of
 476     collisions moderate and hoping that the size of the table will
 477     not grow very large, instead of keeping a moderate load-factor
 478     and hoping that the number of collisions will be small. A
 479     maximal collision-check policy resizes when the longest
 480     probe-sequence grows too large.</p>
 481
 482     <p>Consider Figure <a href="#balls_and_bins">Balls and
 483     bins</a>. Let the size of the hash table be denoted by
 484     <i>m</i>, the length of a probe sequence be denoted by
 485     <i>k</i>, and some load factor be denoted by &alpha;. We would
 486     like to calculate the minimal length of <i>k</i>, such that if
 487     there were <i>&alpha; m</i> elements in the hash table, a probe
 488     sequence of length <i>k</i> would be found with probability at
 489     most <i>1/m</i>.</p>
 490
 491     <h6 class="c1"><a name="balls_and_bins" id=
 492     "balls_and_bins"><img src="balls_and_bins.png" alt=
 493     "no image" /></a></h6>
 494
 495     <h6 class="c1">Balls and bins.</h6>
 496
 497     <p>Denote the probability that a probe sequence of length
 498     <i>k</i> appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the
 499     length of the probe sequence of bin <i>i</i> by
 500     <i>l<sub>i</sub></i>, and assume uniform distribution. Then</p>
 501
 502     <p><a name="prob_of_p1" id=
 503     "prob_of_p1"><i>p<sub>1</sub></i></a> = (3)</p>
 504
 505     <p class="c2"><b>P</b>(l<sub>1</sub> &ge; k) =</p>
 506
 507     <p><i><b>P</b>(l<sub>1</sub> &ge; &alpha; ( 1 + k / &alpha; - 1
 508     ) &le;</i> (a)</p>
 509
 510     <p><i>e ^ ( - ( &alpha; ( k / &alpha; - 1 )<sup>2</sup> ) /2
 511     )</i> ,</p>
 512
 513     <p>where (a) follows from the Chernoff bound [<a href=
 514     "references.html#motwani95random">motwani95random</a>]. To
 515     calculate the probability that <i>some</i> bin contains a probe
 516     sequence greater than <i>k</i>, we note that the
 517     <i>l<sub>i</sub></i> are negatively-dependent [<a href=
 518     "references.html#dubhashi98neg">dubhashi98neg</a>]. Let
 519     <i><b>I</b>(.)</i> denote the indicator function. Then</p>
 520
 521     <p><a name="at_least_k_i_n_some_bin" id=
 522     "at_least_k_i_n_some_bin"><i><b>P</b>( exists<sub>i</sub>
 523     l<sub>i</sub> &ge; k ) =</i> (3)</a></p>
 524
 525     <p class="c2"><b>P</b> ( &sum; <sub>i = 1</sub><sup>m</sup>
 526     <b>I</b>(l<sub>i</sub> &ge; k) &ge; 1 ) =</p>
 527
 528     <p><i><b>P</b> ( &sum; <sub>i = 1</sub><sup>m</sup> <b>I</b> (
 529     l<sub>i</sub> &ge; k ) &ge; m p<sub>1</sub> ( 1 + 1 / (m
 530     p<sub>1</sub>) - 1 ) ) &le;</i> (a)</p>
 531
 532     <p class="c2">e ^ ( ( - m p<sub>1</sub> ( 1 / (m p<sub>1</sub>)
 533     - 1 ) <sup>2</sup> ) / 2 ) ,</p>
 534
 535     <p>where (a) follows from the fact that the Chernoff bound can
 536     be applied to negatively-dependent variables [<a href=
 537     "references.html#dubhashi98neg">dubhashi98neg</a>]. Inserting
 538     <a href="#prob_of_p1">(2)</a> into <a href=
 539     "#at_least_k_i_n_some_bin">(3)</a>, and equating with
 540     <i>1/m</i>, we obtain</p>
 541
 542     <p><i>k ~ &radic; ( 2 &alpha;</i> ln <i>2 m</i> ln<i>(m) )
 543     )</i> .</p>
 544
 545     <h3><a name="imp_pb_ds" id="imp_pb_ds">Implementation</a></h3>
 546
 547     <p>This sub-subsection describes the implementation of the
 548     above in <tt>pb_ds</tt>. It first describes resize policies and
 549     their decomposition into trigger and size policies, then
 550     describes pre-defined classes, and finally discusses controlled
 551     access the policies' internals.</p>
 552
 553     <h4>Resize Policies and Their Decomposition</h4>
 554
 555     <p>Each hash-based container is parametrized by a
 556     <tt>Resize_Policy</tt> parameter; the container derives
 557     <tt><b>public</b></tt>ly from <tt>Resize_Policy</tt>. For
 558     example:</p>
 559     <pre>
 560 <a href="cc_hash_table.html">cc_hash_table</a>&lt;
 561     <b>typename</b> Key,
 562     <b>typename</b> Mapped,
 563     ...
 564     <b>typename</b> Resize_Policy
 565     ...&gt; :
 566         <b>public</b> Resize_Policy
 567 </pre>
 568
 569     <p>As a container object is modified, it continuously notifies
 570     its <tt>Resize_Policy</tt> base of internal changes
 571     (<i>e.g.</i>, collisions encountered and elements being
 572     inserted). It queries its <tt>Resize_Policy</tt> base whether
 573     it needs to be resized, and if so, to what size.</p>
 574
 575     <p>Figure <a href="#insert_resize_sequence_diagram1">Insert
 576     resize sequence diagram</a> shows a (possible) sequence diagram
 577     of an insert operation. The user inserts an element; the hash
 578     table notifies its resize policy that a search has started
 579     (point A); in this case, a single collision is encountered -
 580     the table notifies its resize policy of this (point B); the
 581     container finally notifies its resize policy that the search
 582     has ended (point C); it then queries its resize policy whether
 583     a resize is needed, and if so, what is the new size (points D
 584     to G); following the resize, it notifies the policy that a
 585     resize has completed (point H); finally, the element is
 586     inserted, and the policy notified (point I).</p>
 587
 588     <h6 class="c1"><a name="insert_resize_sequence_diagram1" id=
 589     "insert_resize_sequence_diagram1"><img src=
 590     "insert_resize_sequence_diagram1.png" alt=
 591     "no image" /></a></h6>
 592
 593     <h6 class="c1">Insert resize sequence diagram.</h6>
 594
 595     <p>In practice, a resize policy can be usually orthogonally
 596     decomposed to a size policy and a trigger policy. Consequently,
 597     the library contains a single class for instantiating a resize
 598     policy: <a href=
 599     "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
 600     is parametrized by <tt>Size_Policy</tt> and
 601     <tt>Trigger_Policy</tt>, derives <tt><b>public</b></tt>ly from
 602     both, and acts as a standard delegate [<a href=
 603     "references.html#gamma95designpatterns">gamma95designpatterns</a>]
 604     to these policies.</p>
 605
 606     <p>Figures <a href="#insert_resize_sequence_diagram2">Standard
 607     resize policy trigger sequence diagram</a> and <a href=
 608     "#insert_resize_sequence_diagram3">Standard resize policy size
 609     sequence diagram</a> show sequence diagrams illustrating the
 610     interaction between the standard resize policy and its trigger
 611     and size policies, respectively.</p>
 612
 613     <h6 class="c1"><a name="insert_resize_sequence_diagram2" id=
 614     "insert_resize_sequence_diagram2"><img src=
 615     "insert_resize_sequence_diagram2.png" alt=
 616     "no image" /></a></h6>
 617
 618     <h6 class="c1">Standard resize policy trigger sequence
 619     diagram.</h6>
 620
 621     <h6 class="c1"><a name="insert_resize_sequence_diagram3" id=
 622     "insert_resize_sequence_diagram3"><img src=
 623     "insert_resize_sequence_diagram3.png" alt=
 624     "no image" /></a></h6>
 625
 626     <h6 class="c1">Standard resize policy size sequence
 627     diagram.</h6>
 628
 629     <h4>Pre-Defined Policies</h4>
 630
 631     <p>The library includes the following
 632     instantiations of size and trigger policies:</p>
 633
 634     <ol>
 635       <li><a href=
 636       "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>
 637       implements a load check trigger policy.</li>
 638
 639       <li><a href=
 640       "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
 641       implements a collision check trigger policy.</li>
 642
 643       <li><a href=
 644       "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>
 645       implements an exponential-size policy (which should be used
 646       with mask range hashing).</li>
 647
 648       <li><a href=
 649       "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>
 650       implementing a size policy based on a sequence of primes
 651       [<a href="references.html#sgi_stl">sgi_stl</a>] (which should
 652       be used with mod range hashing</li>
 653     </ol>
 654
 655     <p>Figure <a href="#resize_policy_cd">Resize policy class
 656     diagram</a> gives an overall picture of the resize-related
 657     classes. <a href=
 658     "basic_hash_table.html"><tt>basic_hash_table</tt></a>
 659     is parametrized by <tt>Resize_Policy</tt>, which it subclasses
 660     publicly. This class is currently instantiated only by <a href=
 661     "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.
 662     <a href=
 663     "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
 664     itself is parametrized by <tt>Trigger_Policy</tt> and
 665     <tt>Size_Policy</tt>. Currently, <tt>Trigger_Policy</tt> is
 666     instantiated by <a href=
 667     "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>,
 668     or <a href=
 669     "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>;
 670     <tt>Size_Policy</tt> is instantiated by <a href=
 671     "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>,
 672     or <a href=
 673     "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>.</p>
 674
 675     <h6 class="c1"><a name="resize_policy_cd" id=
 676     "resize_policy_cd"><img src="resize_policy_cd.png" alt=
 677     "no image" /></a></h6>
 678
 679     <h6 class="c1">Resize policy class diagram.</h6>
 680
 681     <h4>Controlled Access to Policies' Internals</h4>
 682
 683     <p>There are cases where (controlled) access to resize
 684     policies' internals is beneficial. <i>E.g.</i>, it is sometimes
 685     useful to query a hash-table for the table's actual size (as
 686     opposed to its <tt>size()</tt> - the number of values it
 687     currently holds); it is sometimes useful to set a table's
 688     initial size, externally resize it, or change load factors.</p>
 689
 690     <p>Clearly, supporting such methods both decreases the
 691     encapsulation of hash-based containers, and increases the
 692     diversity between different associative-containers' interfaces.
 693     Conversely, omitting such methods can decrease containers'
 694     flexibility.</p>
 695
 696     <p>In order to avoid, to the extent possible, the above
 697     conflict, the hash-based containers themselves do not address
 698     any of these questions; this is deferred to the resize policies,
 699     which are easier to change or replace. Thus, for example,
 700     neither <a href=
 701     "cc_hash_table.html"><tt>cc_hash_table</tt></a> nor
 702     <a href=
 703     "gp_hash_table.html"><tt>gp_hash_table</tt></a>
 704     contain methods for querying the actual size of the table; this
 705     is deferred to <a href=
 706     "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.</p>
 707
 708     <p>Furthermore, the policies themselves are parametrized by
 709     template arguments that determine the methods they support
 710     ([<a href=
 711     "references.html#alexandrescu01modern">alexandrescu01modern</a>]
 712     shows techniques for doing so). <a href=
 713     "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
 714     is parametrized by <tt>External_Size_Access</tt> that
 715     determines whether it supports methods for querying the actual
 716     size of the table or resizing it. <a href=
 717     "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>
 718     is parametrized by <tt>External_Load_Access</tt> that
 719     determines whether it supports methods for querying or
 720     modifying the loads. <a href=
 721     "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
 722     is parametrized by <tt>External_Load_Access</tt> that
 723     determines whether it supports methods for querying the
 724     load.</p>
 725
 726     <p>Some operations, for example, resizing a container at
 727     run time, or changing the load factors of a load-check trigger
 728     policy, require the container itself to resize. As mentioned
 729     above, the hash-based containers themselves do not contain
 730     these types of methods, only their resize policies.
 731     Consequently, there must be some mechanism for a resize policy
 732     to manipulate the hash-based container. As the hash-based
 733     container is a subclass of the resize policy, this is done
 734     through virtual methods. Each hash-based container has a
 735     <tt><b>private</b></tt> <tt><b>virtual</b></tt> method:</p>
 736     <pre>
 737 <b>virtual void</b>
 738     do_resize
 739     (size_type new_size);
 740 </pre>
 741
 742     <p>which resizes the container. Implementations of
 743     <tt>Resize_Policy</tt> can export public methods for resizing
 744     the container externally; these methods internally call
 745     <tt>do_resize</tt> to resize the table.</p>
 746
 747     <h2><a name="policy_interaction" id="policy_interaction">Policy
 748     Interaction</a></h2>
 749
 750     <p>Hash-tables are unfortunately especially susceptible to
 751     choice of policies. One of the more complicated aspects of this
 752     is that poor combinations of good policies can form a poor
 753     container. Following are some considerations.</p>
 754
 755     <h3><a name="policy_interaction_probe_size_trigger" id=
 756     "policy_interaction_probe_size_trigger">Probe Policies, Size
 757     Policies, and Trigger Policies</a></h3>
 758
 759     <p>Some combinations do not work well for probing containers.
 760     For example, combining a quadratic probe policy with an
 761     exponential size policy can yield a poor container: when an
 762     element is inserted, a trigger policy might decide that there
 763     is no need to resize, as the table still contains unused
 764     entries; the probe sequence, however, might never reach any of
 765     the unused entries.</p>
 766
 767     <p>Unfortunately, <tt>pb_ds</tt> cannot detect such problems at
 768     compilation (they are halting reducible). It therefore defines
 769     an exception class <a href=
 770     "insert_error.html"><tt>insert_error</tt></a> to throw an
 771     exception in this case.</p>
 772
 773     <h3><a name="policy_interaction_hash_trigger" id=
 774     "policy_interaction_hash_trigger">Hash Policies and Trigger
 775     Policies</a></h3>
 776
 777     <p>Some trigger policies are especially susceptible to poor
 778     hash functions. Suppose, as an extreme case, that the hash
 779     function transforms each key to the same hash value. After some
 780     inserts, a collision detecting policy will always indicate that
 781     the container needs to grow.</p>
 782
 783     <p>The library, therefore, by design, limits each operation to
 784     one resize. For each <tt>insert</tt>, for example, it queries
 785     only once whether a resize is needed.</p>
 786
 787     <h3><a name="policy_interaction_eq_sth_hash" id=
 788     "policy_interaction_eq_sth_hash">Equivalence Functors, Storing
 789     Hash Values, and Hash Functions</a></h3>
 790
 791     <p><a href=
 792     "cc_hash_table.html"><tt>cc_hash_table</tt></a> and
 793     <a href=
 794     "gp_hash_table.html"><tt>gp_hash_table</tt></a> are
 795     parametrized by an equivalence functor and by a
 796     <tt>Store_Hash</tt> parameter. If the latter parameter is
 797     <tt><b>true</b></tt>, then the container stores with each entry
 798     a hash value, and uses this value in case of collisions to
 799     determine whether to apply a hash value. This can lower the
 800     cost of collision for some types, but increase the cost of
 801     collisions for other types.</p>
 802
 803     <p>If a ranged-hash function or ranged probe function is
 804     directly supplied, however, then it makes no sense to store the
 805     hash value with each entry. <tt>pb_ds</tt>'s container will
 806     fail at compilation, by design, if this is attempted.</p>
 807
 808     <h3><a name="policy_interaction_size_load_check" id=
 809     "policy_interaction_size_load_check">Size Policies and
 810     Load-Check Trigger Policies</a></h3>
 811
 812     <p>Assume a size policy issues an increasing sequence of sizes
 813     <i>a, a q, a q<sup>1</sup>, a q<sup>2</sup>, ...</i> For
 814     example, an exponential size policy might issue the sequence of
 815     sizes <i>8, 16, 32, 64, ...</i></p>
 816
 817     <p>If a load-check trigger policy is used, with loads
 818     <i>&alpha;<sub>min</sub></i> and <i>&alpha;<sub>max</sub></i>,
 819     respectively, then it is a good idea to have:</p>
 820
 821     <ol>
 822       <li><i>&alpha;<sub>max</sub> ~ 1 / q</i></li>
 823
 824       <li><i>&alpha;<sub>min</sub> &lt; 1 / (2 q)</i></li>
 825     </ol>
 826
 827     <p>This will ensure that the amortized hash cost of each
 828     modifying operation is at most approximately 3.</p>
 829
 830     <p><i>&alpha;<sub>min</sub> ~ &alpha;<sub>max</sub></i> is, in
 831     any case, a bad choice, and <i>&alpha;<sub>min</sub> &gt;
 832     &alpha;<sub>max</sub></i> is horrendous.</p>
 833   </div>
 834 </body>
 835 </html>