libstdc++-v3/docs/html/ext/pb_assoc/hash_based_containers.html

   1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
   2 <html>
   3     <head>
   4         <title>Hash-Based Containers</title>
   5         <meta name="GENERATOR" content="Microsoft Visual Studio .NET 7.1">
   6         <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5">
   7     </head>
   8
   9 <body bgcolor = "white">
  10
  11 <h1>Hash-Based Containers</h1>
  12
  13 <p>
  14     This section describes hash-based containers. It is organized
  15 as follows.
  16 </p>
  17
  18 <ol>
  19         <li>
  20                 <a href = "#overview">Overview</a> is an overview.
  21         </li>
  22         <li>
  23                 <a href = "#hash_policies">Hash Policies</a> discusses
  24         hash policies.
  25         </li>
  26         <li>
  27                 <a href = "#resize_policies">Resize Policies</a> discusses
  28         resize policies.
  29         </li>
  30         <li>
  31                 <a href = "#policy_interaction">Policy Interaction</a> discusses
  32         interaction between policies.
  33         </li>
  34 </ol>
  35
  36
  37
  38 <h2><a name = "overview">Overview</a></h2>
  39
  40
  41 <p>
  42         Figure
  43 <a href = "#hash_cd">Hash-based containers</a>
  44         shows the container-hierarchy; the hash-based containers are circled.
  45 <a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
  46 is a collision-chaining hash-based container;
  47 <a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
  48 is a (general) probing hash-based container.
  49 </p>
  50
  51 <h6 align = "center">
  52 <a name = "hash_cd">
  53 <img src = "hash_cd.jpg" width = "70%" alt = "no image">
  54 </h6>
  55 <h6 align = "center">
  56 </a>
  57 Hash-based containers.
  58 </h6>
  59
  60 <p>
  61         The collision-chaining hash-based container has the following declaration.
  62 </p>
  63 <pre>
  64 <b>template</b>&lt;
  65         <b>typename</b> Key,
  66         <b>typename</b> Data,
  67         <b>class</b> Hash_Fn = std::hash&lt;Key&gt;,
  68         <b>class</b> Eq_Fn = std::equal_to&lt;Key&gt;,
  69         <b>class</b> Comb_Hash_Fn =
  70                 <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
  71         <b>class</b> Resize_Policy = <i>default explained below.</i>
  72         <b>bool</b> Store_Hash = <b>false</b>,
  73         <b>class</b> Allocator =
  74                 std::allocator&lt;<b>char</b>&gt; &gt;
  75 <b>class</b> <a href = "cc_hash_assoc_cntnr.html">cc_hash_assoc_cntnr</a>;
  76 </pre>
  77
  78 <p>
  79         The parameters have the following meaning:
  80 </p>
  81 <ol>
  82         <li> <tt>Key</tt> is the key type.
  83         </li>
  84         <li> <tt>Data</tt> is the data-policy, and is explained in
  85 <a href = "ms_gen.html#ds_policy">Mapping-Semantics Genericity::Data Types as a Policy</a>.
  86         </li>
  87         <li> <tt>Hash_Fn</tt> is a key hashing functor.</li>
  88         <li> <tt>Eq_Fn</tt> is a key equivalence functor.</li>
  89         <li> <tt>Comb_Hash_Fn</tt> is a <i>range-hashing_functor</i>; it
  90 describes how to translate hash values into positions within the table.
  91 This is described in
  92 <a name = "#hash_policies">Hash Policies</a>.</li>
  93         </li>
  94         <li> <tt>Resize_Policy</tt> describes how a container object should
  95 change its internal size. This is described in
  96 <a name = #resize_policies">Resize Policies</a>.</li>
  97         <li> <tt>Store_Hash</tt> indicates whether the hash value should
  98 be stored with each entry. This is described in
  99 <a name = "#policy_interaction">Policy Interaction</a>.</li>
 100         <li> <tt>Allocator</tt> is (surprisingly) an allocator type.
 101         </li>
 102 </ol>
 103
 104 <p>
 105         The probing hash-based container has the following declaration.
 106 </p>
 107 <pre>
 108 <b>template</b>&lt;
 109         <b>typename</b> Key,
 110         <b>typename</b> Data,
 111         <b>class</b> Hash_Fn =
 112                 std::hash&lt;
 113                         Key&gt;,
 114         <b>class</b> Eq_Fn =
 115                 std::equal_to&lt;
 116                         Key&gt;,
 117         <b>class</b> Comb_Probe_Fn =
 118                 <a href = "direct_mask_range_hashing.html">direct_mask_range_hashing</a>&lt;&gt;
 119         <b>class</b> Probe_Fn = <i>default explained below.</i>
 120         <b>class</b> Resize_Policy = <i>default explained below.</i>
 121         <b>bool</b> Store_Hash = <b>false</b>,
 122         <b>class</b> Allocator =
 123                 std::allocator&lt;<b>char</b>&gt; &gt;
 124 <b>class</b> <a href = "gp_hash_assoc_cntnr.html">gp_hash_assoc_cntnr</a>;
 125 </pre>
 126
 127 <p>
 128         The parameters are identical to those of the collision-chaining container, except
 129 for the following.
 130 </p>
 131 <ol>
 132         <li> <tt>Comb_Probe_Fn</tt> describes how to transform a probe sequence into
 133 a sequence of positions within the table.
 134         </li>
 135         <li> <tt>Probe_Fn</tt> describes a probe sequence policy.</li>
 136 </ol>
 137
 138
 139 <p>
 140         Some of the default template values depend on the values of other parameters,
 141 and are explained in
 142 <a name = "#policy_interaction">Policy Interaction</a>.
 143 </p>
 144
 145 <h2><a name = "hash_policies">Hash Policies</h2></a>
 146 <p>
 147     This subsection describes hash policies. It is organized as follows:
 148 </p>
 149 <ol>
 150     <li> <a href = "#general_terms">General Terms</a>  describes
 151             some general terms.
 152     </li>
 153     <li> <a href = "#range_hashing_fns">Range-Hashing Functions</a>
 154         describes range-hasing functions.</li>
 155     <li> <a href = "#hash_policies_ranged_hash_policies">Ranged-Hash Functions</a>
 156         describes ranged-hash functions. </li>
 157     <li> <a href = "#pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a>
 158             describes the implementation of these concepts in <tt>pb_assoc</tt>.
 159     </li>
 160 </ol>
 161
 162
 163 <h3><a name="general_terms">General Terms</a></h3>
 164
 165 <p>
 166     There
 167 are actually three functions involved in transforming a key into a hash-table's
 168 position (see Figure
 169 <a href = "#hash_ranged_hash_range_hashing_fns">
 170 Hash runctions, ranged-hash functions, and range-hashing functions
 171 </a>):
 172 </p>
 173 <ol>
 174     <li>
 175         A <i>ranged-hash</i> function, which maps keys into an interval of the
 176         non-negative integrals. This is the function actually required by the
 177         hash-table algorithm.
 178     </li>
 179     <li>
 180         A hash function, which maps keys into non-negative integral types. This is
 181         typically specified by the writer of the key class.
 182     </li>
 183     <li>
 184         A <i>range-hashing</i> function, which maps non-negative integral types into an
 185         interval of non-negative integral types.
 186     </li>
 187 </ol>
 188
 189 <h6 align = "center">
 190 <a name = "hash_ranged_hash_range_hashing_fns">
 191 <img src = "hash_ranged_hash_range_hashing_fns.jpg" width = "70%" alt = "no image">
 192 </h6>
 193 <h6 align = "center">
 194 </a>
 195 Hash runctions, ranged-hash functions, and range-hashing functions.
 196 </h6>
 197
 198 <p>
 199     Let <i>U</i> be a domain (<i>e.g.</i>, the integers, or the strings of 3
 200     characters). A hash-table algorithm needs to map elements of <i>U</i> "uniformly"
 201     into the range <i>[0,..., m - 1]</i> (where <i>m</i> is a non-negative integral
 202     value, and is, in general, time varying). <i>I.e.</i>, the algorithm needs a <i>ranged-hash</i>
 203     function
 204 </p>
 205 <p>
 206     <i>f : U &times; Z<sub>+</sub> &rarr; Z<sub>+</sub> </i>,
 207 </p>
 208 <p>
 209     such that for any <i>u</i> in <i>U</i>
 210 ,
 211 </p>
 212 <p>
 213     <i>0 &le; f(u, m) &le; m - 1 </i>,
 214 </p>
 215 <p>
 216     and which has "good uniformity" properties [<a href="references.html#knuth98sorting">knuth98sorting</a>].
 217     One common solution is to use the composition of the hash function
 218 </p>
 219 <p>
 220     <i>h : U &rarr; Z<sub>+</sub> </i>,
 221 </p>
 222 <p>
 223     which maps elements of <i>U</i> into the non-negative integrals, and
 224 </p>
 225 <p>
 226     <i>g : Z<sub>+</sub> &times; Z<sub>+</sub> &rarr; Z<sub>+</sub>, </i>
 227 </p>
 228 <p>
 229     which maps a non-negative hash value, and a non-negative range upper-bound into
 230     a non-negative integral in the range between 0 (inclusive) and the range upper
 231     bound (exclusive), <i>i.e.</i>, for any <i>r</i> in <i>Z<sub>+</sub></i>,
 232 </p>
 233 <p>
 234     <i>0 &le; g(r, m) &le; m - 1 </i>.
 235 </p>
 236 <p>
 237     The resulting ranged-hash function, is
 238 </p>
 239 <p>
 240     <i><a name="eqn:ranged_hash_composed_of_hash_and_range_hashing">f(u , m) = g(h(u), m) </a>
 241     </i>(1) .
 242 </p>
 243
 244 <p>
 245     From the above, it is obvious that given <i>g</i> and <i>h</i>, <i>f</i> can
 246     always be composed (however the converse is not true). The STL's hash-based
 247     containers allow specifying a hash function, and use a hard-wired range-hashing function; the ranged-hash function is implicitly composed.
 248 </p>
 249
 250
 251 <p>
 252     The above describes the case where a key is to be mapped into a <i>single
 253 position</i> within a hash table, <i>e.g.</i>, in a collision-chaining table.
 254 In other cases, a key is to be mapped into a <i>sequence of poisitions</i>
 255 within a table, <i>e.g.</i>, in a probing table.
 256 </p>
 257 <p>
 258     Similar terms apply in this case: the table requires a <i>ranged probe</i>
 259 function, mapping a key into a sequence of positions withing the table. This is
 260 typically acheived by composing a <i>hash function</i> mapping the key
 261 into a non-negative integral type, a <i>probe</i> function transforming the
 262 hash value into a sequence of hash values, and a <i>range-hashing</i> function
 263 transforming the sequence of hash values into a sequence of positions.
 264 </p>
 265
 266
 267 <h3><a name="range_hashing_fns">Range-Hashing Functions</a></h3>
 268
 269 <p>
 270     Some common choices for range-hashing functions are the division,
 271     multiplication, and middle-square methods [<a href="references.html#knuth98sorting">knuth98sorting</a>],
 272     defined as
 273 </p>
 274 <p>
 275     <i><a name="eqn:division_method">g(r, m) = r mod m </a></i>(2) ,
 276 </p>
 277 <p>
 278     <i>g(r, m) = &lceil; u/v ( a r mod v ) &rceil; </i>,
 279 </p>
 280 <p>
 281     and
 282 </p>
 283 <p>
 284     <i>g(r, m) = &lceil; u/v ( r<sup>2</sup> mod v ) &rceil; </i>,
 285 </p>
 286 <p>
 287 respectively, for some positive integrals <i>u</i> and <i>v</i> (typically
 288 powers of 2), and some <i>a</i>. Each of these range-hashing functions works
 289 best for some different setting.
 290 </p>
 291 <p>
 292     The division method <a href="#division_method">(2)</a> is a very common
 293     choice. However, even this single method can be implemented in two very
 294     different ways. It is possible to implement <a href="#division_method">(2)</a>
 295     using the low level <i>%</i> (modulo) operation (for any <i>m</i>), or the low
 296     level <i>&amp;</i> (bit-mask) operation (for the case where <i>m</i> is a power of
 297     2), <i>i.e.</i>,
 298 </p>
 299 <p>
 300     <i><a name="eqn:division_method_prime_mod">g(r, m) = r % m </a></i>(3) ,
 301 </p>
 302 <p>
 303     and
 304 </p>
 305 <p>
 306     <a name="eqn:division_method_bit_mask">
 307     <i>g(r, m) = r &amp; m - 1, ( m = 2<sup>k</sup>
 308     </i>
 309         for some<i> k) </i></a>(4) ,
 310 </p>
 311 <p>
 312     respectively.
 313 </p>
 314 <p>
 315     The <i>%</i> (modulo) implementation <a href="#division_method_prime_mod">(3)</a>
 316     has the advantage that for <i>m</i> a prime far from a power of 2, <i>g(r, m)</i>
 317     is affected by all the bits of <i>r</i> (minimizing the chance of collision).
 318     It has the disadvantage of using the costly modulo operation. This method is
 319     hard-wired into SGI's implementation [<a href="references.html#sgi_stl">sgi_stl</a>].
 320 </p>
 321
 322 <p>
 323     The <i>&amp;</i> (bit-mask) implementation <a href="#division_method_bit_mask">(4)</a>
 324     has the advantage of relying on the fast bitwise and operation. It has the
 325     disadvantage that for <i>g(r, m)</i> is affected only by the low order bits of <i>r</i>.
 326     This method is hard-wired into Dinkumware's implementation [<a href="references.html#dinkumware_stl">dinkumware_stl</a>].
 327 </p>
 328
 329
 330
 331
 332 <h3><a name="hash_policies_ranged_hash_policies">Ranged-Hash Functions</a></h3>
 333
 334 <p>
 335     In some less frequent cases it is beneficial to allow the client to
 336 directly specify a ranged-hash hash function. It is true, that the writer of
 337 the ranged-hash function cannot rely on the values of <i>m</i> having specific
 338 numerical properties suitable for hashing (in the sense used in [<a href="references.html#knuth98sorting">knuth98sorting</a>]),
 339 since the values of <i>m</i> are determined by a resize policy with possibly
 340 orthogonal considerations.
 341 </p>
 342
 343 <p>
 344         There are two cases where a ranged-hash function can be superior. The firs is when using perfect hashing
 345 [<a href="references.html#knuth98sorting">knuth98sorting</a>]; the second
 346 is when the values of <i>m</i> can be used to estimate the
 347 "general" number of distinct values required. This is described in the following.
 348 </p>
 349
 350 <p>
 351     Let
 352 </p>
 353
 354 <p>
 355     <i>s = [ s<sub>0</sub>,..., s<sub>t - 1</sub>] </i>
 356 </p>
 357
 358 <p>
 359     be a string of <i>t</i> characters, each of which is from domain <i>S</i>.
 360 Consider the following ranged-hash function:
 361 </p>
 362
 363 <p>
 364     <a name="eqn:total_string_dna_hash">
 365         <i>
 366             f<sub>1</sub>(s, m) =
 367             &sum; <sub>i =
 368             0</sub><sup>t   - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod<i> m </i>
 369     </a> (5) ,
 370 </p>
 371
 372 <p>
 373     where <i>a</i> is some non-negative integral value. This is the standard
 374 string-hashing function used in SGI's implementation (with <i>a = 5</i>) [<a href="references.html#sgi_stl">sgi_stl</a>].
 375 Its advantage is that it takes into account all of the characters of the
 376 string.
 377 </p>
 378
 379 <p>
 380     Now assume that <i>s</i> is the string representation of a of a long DNA
 381 sequence (and so <i>S = {'A', 'C', 'G', 'T'}</i>). In this case, scanning the
 382 entire string might be prohibitively expensive. A possible alternative might be
 383 to use only the first <i>k</i> characters of the string, where
 384 </p>
 385
 386 <p>
 387     k <sup>|S|</sup> &ge; m ,
 388 </p>
 389 <p>
 390     <i>i.e.</i>, using the hash function
 391 </p>
 392 <p>
 393     <a name="eqn:only_k_string_dna_hash"><i>f<sub>2</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k
 394                 - 1</sup> s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i></a>, (6)
 395 </p>
 396 <p>
 397     requiring scanning over only
 398 </p>
 399 <p>
 400     <i>k = </i>log<i><sub>4</sub>( m ) </i>
 401 </p>
 402 <p>
 403     characters.
 404 </p>
 405 <p>
 406     Other more elaborate hash-functions might scan <i>k</i> characters starting at
 407     a random position (determined at each resize), or scanning <i>k</i> random
 408     positions (determined at each resize), <i>i.e.</i>, using
 409 </p>
 410 <p>
 411     <i>f<sub>3</sub>(s, m) = &sum; <sub>i = r<sub>0</sub></sub><sup>r<sub>0</sub> + k - 1</sup>
 412         s<sub>i</sub> a<sup>i</sup> </i>mod <i>m </i>,
 413 </p>
 414 <p>
 415     or
 416 </p>
 417 <p>
 418     <i>f<sub>4</sub>(s, m) = &sum; <sub>i = 0</sub><sup>k - 1</sup> s<sub>r<sub>i</sub></sub>
 419         a<sup>r<sub>i</sub></sup> </i>mod <i>m </i>,
 420 </p>
 421 <p>
 422 <p>
 423     respectively, for <i>r<sub>0</sub>,..., r<sub>k-1</sub></i> each in the
 424     (inclusive) range <i>[0,...,t-1]</i>.
 425 </p>
 426
 427
 428 <h3><a name="pb_assoc_imp">Implementation in <tt>pb_assoc</tt></a></h3>
 429
 430 <p>
 431 <a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a> is
 432 parameterized by <tt>Hash_Fn</tt> and <tt>Comb_Hash_Fn</tt>, a hash functor
 433 and a combining hash functor, respectively.
 434 </p>
 435
 436 <p>
 437         For any hash functor except <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
 438 one of the
 439 <a href = "concepts.html#concepts_null_policies">Concepts::Null Policy Classes</a>,
 440 then <tt>Comb_Hash_Fn</tt> is considered a range-hashing functor.
 441 The container will synthesize a ranged-hash functor from both. For example, Figure
 442 <a href = "#hash_range_hashing_seq_diagram">
 443 Insert hash sequence diagram
 444 </a>
 445 shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
 446 the container transforms the key into a non-negative integral using the hash
 447 functor (points B and C), and transforms the result into a position
 448 using the combining functor (points D and E).
 449 </p>
 450
 451 <h6 align = "center">
 452 <a name = "hash_range_hashing_seq_diagram">
 453 <img src = "hash_range_hashing_seq_diagram.jpg" width = "70%" alt = "no image">
 454 </a>
 455 </h6>
 456 <h6 align = "center">
 457 Insert hash sequence diagram.
 458 </h6>
 459
 460
 461 <p>
 462     <tt>pb_assoc</tt> contains the following range-hashing policies:
 463 </p>
 464
 465 <ol>
 466     <li>
 467 <a href = "direct_mask_range_hashing.html"><tt>direct_mask_range_hashing</tt></a>
 468 and
 469 <a href = "direct_mod_range_hashing.html"><tt>direct_mod_range_hashing</tt></a>
 470 are range-hashing functions based on a bit-mask and a modulo operation, respectively.
 471     </li>
 472 </ol>
 473
 474
 475 <p>
 476     If <tt>Comb_Hash_Fn</tt> is instantiated by
 477 <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a>,
 478 and a combining-hash functor, the container treats
 479 the combining hash functor as a ranged-hash function. For example, Figure
 480 <a href = "#hash_range_hashing_seq_diagram2">
 481 Insert hash sequence diagram with a null combination policy
 482 </a>
 483 shows an <tt>insert</tt> sequence diagram. The user inserts an element (point A),
 484 the container transforms the key into a position
 485 using the combining functor (points B and C).
 486 </p>
 487
 488
 489 <h6 align = "center">
 490 <a name = "hash_range_hashing_seq_diagram2">
 491 <img src = "hash_range_hashing_seq_diagram2.jpg" width = "70%" alt = "no image">
 492 </a>
 493 </h6>
 494 <h6 align = "center">
 495 Insert hash sequence diagram with a null combination policy.
 496 </h6>
 497
 498 <p>
 499         Similarly,
 500 <a href = "gp_hash_assoc_cntnr.html"></a><tt>gp_hash_assoc_cntnr</tt>
 501 is parameterized by <tt>Hash_Fn</tt>, <tt>Probe_Fn</tt>, and
 502 <tt>Comb_Probe_Fn</tt>. As before, if <tt>Probe_Fn</tt>
 503 and <tt>Comb_Probe_Fn</tt> are, respectively,
 504 <a href = "null_hash_fn.html"><tt>null_hash_fn</tt></a> and
 505 <a href = "null_probe_fn.html"><tt>null_probe_fn</tt></a>,
 506 then <tt>Comb_Probe_Fn</tt> is a ranged-probe functor. Otherwise, <tt>Hash_Fn</tt>
 507 is a hash functor, <tt>Probe_Fn</tt> is a functor for offsets from a hash value,
 508 and <tt>Comb_Probe_Fn</tt> transforms a probe sequence into a sequence of positions
 509 within the table.
 510 </p>
 511
 512 <p>
 513         <tt>pb_assoc</tt> contains the following probe policies:
 514 </p>
 515
 516 <ol>
 517     <li>
 518 <a href = "linear_probe_fn.html"><tt>linear_probe_fn</tt></a> is a linear probe
 519 function.
 520     </li>
 521         <li>
 522 <a href = "quadratic_probe_fn.html"><tt>quadratic_probe_fn</tt></a> is
 523 a quadratic probe function.
 524     </li>
 525 </ol>
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535 <h2><a name = "resize_policies">Resize Policies</a></h2>
 536
 537 <p>
 538     This subsection describes resize policies. It is organized as follows:
 539 </p>
 540
 541 <ol>
 542     <li> <a href = "#general">General Terms</a> describes general
 543         terms.
 544     </li>
 545     <li> <a href = "#size_policies">Size Policies</a>  describes size
 546     policies.
 547     </li>
 548     <li> <a href = "#trigger_policies">Trigger Policies</a>  describes trigger
 549     policies.
 550     </li>   <li> <a href = "#imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a>
 551          describes the implementation of these concepts in <tt>pb_assoc</tt>.
 552     </li>
 553 </ol>
 554
 555
 556 <h3><a name = "general">General Terms</a></h3>
 557
 558 <p>
 559     Hash-tables, as opposed to trees, do not naturally grow or shrink. It
 560 is necessary to specify policies to determine how and when a hash table should change
 561 its size.
 562 </p>
 563
 564 <p>
 565     In general, resize policies can be decomposed into (probably orthogonal)
 566 policies:
 567 </p>
 568 <ol>
 569     <li> A <i>size policy</i> indicating <i>how</i> a hash table should
 570 grow (<i>e.g.,</i> it should multiply by powers of 2).
 571     </li>
 572     <li> A <i>trigger policy</i> indicating <i>when</i> a hash table should
 573 grow (<i>e.g.,</i> a load factor is exceeded).
 574     </li>
 575 </ol>
 576
 577
 578
 579 <h3><a name = "size_policies">Size Policies</a></h3>
 580
 581 <p>
 582     Size policies determine how a hash table
 583 changes size. These policies are simple, and there are relatively
 584 few sensible options. An exponential-size policy (with the initial
 585 size and growth factors both powers of 2) works well with a
 586 mask-based range-hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
 587 and is the
 588 hard-wired policy used by Dinkumware
 589 [<a href = "references.html#dinkumware_stl">dinkumware_stl</a>]. A
 590 prime-list based policy works well with a modulo-prime range
 591 hashing function (see the <a href = "hash_policies.html#hash_policies_range_hashing_policies">Range-Hashing Policies</a> subsection),
 592 and is the
 593 hard-wired policy used by SGI's implementation
 594 [<a href = "references.html#sgi_stl">sgi_stl</a>].
 595 </p>
 596
 597
 598
 599
 600 <h3><a name = "trigger_policies">Trigger Policies</a></h3>
 601
 602 <p>
 603     Trigger policies determine when a hash table changes size.
 604 Following is a description of two polcies: <i>load-check</i>
 605 policies, and a collision-check policies.
 606 </p>
 607
 608 <p>
 609     Load-check policies are straightforward. The user
 610 specifies two factors, <i>&alpha;<sub>min</sub></i> and <i>&alpha;<sub>max</sub></i>, and
 611 the hash table maintains the invariant that
 612 </p>
 613 <p>
 614     <i>
 615         <a name = "eqn:load_factor_min_max">
 616         &alpha;<sub>min</sub>
 617         &le;
 618         (number of stored elements) / (hash-table size)
 619         &le;
 620         &alpha;<sub>max</sub>
 621         </a>
 622     </i>
 623     (1)
 624     .
 625 </p>
 626
 627 <p>
 628     Collision-check policies work in the opposite direction of
 629 load-check policies. They focus on keeping the number of
 630 collisions moderate and hoping
 631 that the size of the table will not grow very large,
 632 instead of keeping a moderate load-factor and
 633 hoping that the number of collisions will be small.
 634 A
 635 maximal collision-check policy resizes when the shortest
 636 probe-sequence grows too large.
 637 </p>
 638
 639
 640 <p>
 641     Consider Figure
 642 <a href = "#balls_and_bins">Balls and bins</a>.
 643     Let the size of the hash table be denoted by <i>m</i>, the
 644 length of a probe sequence be denoted by <i>k</i>, and some load
 645 factor be denoted by &alpha;. We would like to calculate the
 646 minimal length of <i>k</i>, such that if there were <i>&alpha; m</i> elements
 647 in the hash table, a probe sequence of length <i>k</i> would be found
 648 with probability at most <i>1/m</i>.
 649 </p>
 650
 651 <h6 align = "center">
 652 <a name = "balls_and_bins">
 653 <img src = "balls_and_bins.jpg" width = "70%" alt = "no image">
 654 </a>
 655 </h6>
 656 <h6 align = "center">
 657 Balls and bins.
 658 </h6>
 659
 660
 661 <p>
 662     Denote the probability that a probe sequence of length <i>k</i>
 663 appears in bin <i>i</i> by <i>p<sub>i</sub></i>, the length of the probe sequence
 664 of bin <i>i</i> by <i>l<sub>i</sub></i>, and assume uniform distribution.
 665 Then
 666 </p>
 667     <p>
 668     <a name = "eqn:prob_of_p1">
 669         <i>p<sub>1</sub>
 670         = </i>(3)
 671     </a>
 672     </p>
 673     <p>
 674     <i>
 675     <b>P</b>(l<sub>1</sub> &ge; k)
 676     =
 677     </i>
 678     </p>
 679     <p>
 680     <i><b>P</b>(l<sub>1</sub> &ge; &alpha; ( 1 + k / &alpha; - 1 )
 681     &le; </i>(a)
 682     </p>
 683     <p>
 684     <i>
 685     e
 686     ^
 687     (
 688         -
 689         (
 690             &alpha; ( k / &alpha; - 1 )<sup>2</sup>
 691         )
 692         /2
 693     )
 694     </i>
 695     ,
 696 </p>
 697 <p>
 698     where (a) follows from the Chernoff bound
 699 [<a href = "references.html#motwani95random">motwani95random</a>].
 700 To
 701 calculate the probability that <i>some</i> bin contains a probe
 702 sequence greater than <i>k</i>, we note that the <i>l<sub>i</sub></i> are
 703 negatively-dependent
 704 [<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
 705 Let <i><b>I</b>(.)</i>
 706 denote the indicator function. Then
 707     <p>
 708     <a name = "eqn:at_least_k_i_n_some_bin">
 709         <i><b>P</b>( exists<sub>i</sub> l<sub>i</sub> &ge; k )
 710         = </i>(3)
 711     </a>
 712     </p>
 713     <p>
 714     <i>
 715    <b>P</b>
 716     (
 717         &sum; <sub>i = 1</sub><sup>m</sup>
 718         <b>I</b>(l<sub>i</sub> &ge; k) &ge; 1
 719     )
 720     =
 721     </i>
 722     </p>
 723     <p>
 724     <i>
 725     <b>P</b>
 726     (
 727         &sum; <sub>i = 1</sub><sup>m</sup>
 728         <b>I</b>
 729         (
 730             l<sub>i</sub> &ge; k
 731         )
 732         &ge;
 733         m  p<sub>1</sub> ( 1 + 1 / (m p<sub>1</sub>) - 1 )
 734     )
 735     &le; </i>(a)
 736     </p>
 737     <p>
 738     <i>
 739     e
 740     ^
 741     (
 742         (
 743             -
 744             m p<sub>1</sub>
 745             (
 746                 1 / (m p<sub>1</sub>) - 1
 747             )
 748             <sup>2</sup>
 749         )
 750         /
 751         2
 752     )
 753     ,
 754     </i>
 755     </p>
 756 <p>
 757 where (a) follows from the fact that the Chernoff bound can be
 758 applied to negatively-dependent variables
 759 [<a href = "references.html#dubhashi98neg">dubhashi98neg</a>].
 760 Inserting <a href = "#prob_of_p1">(2)</a> into
 761 <a href = "#at_least_k_i_n_some_bin">(3)</a>, and equating with <i>1/m</i>,
 762 we obtain
 763 </p>
 764 <p>
 765     <i>
 766     k
 767     ~
 768     &radic;
 769     (
 770         2 &alpha; </i>ln<i> 2 m </i>ln<i>(m) )
 771     )
 772     </i>
 773     .
 774 </p>
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784 <h3><a name = "imp_pb_assoc">Implementation in <tt>pb_assoc</tt></a></h3>
 785
 786 <p>
 787     The resize policies in the previous subsection are conceptually straightforward. The design
 788 of hash-based containers' size-related interface is complicated by some factors.
 789 </p>
 790 <ol>
 791     <li> Most containers, <i>i.e.</i> lists, trees, and vectors, have a single "size" concept. There is no
 792 distinction between the number of entries the container holds and the number of entries it is using. This,
 793 of course, is not the case for hash-based containers. Moreover, even describing the
 794 "actual" size of a hash-based container (as opposed to its logical size) is difficult - a probing-based container
 795 holds entries to elements, even those it does not use, while a chaining-based container holds pointers to entries.
 796     </li>
 797     <li>
 798     The policies mentioned above operate in terms of invariants. <i>E.g.</i> a load-check trigger policy
 799 maintains an invariant concerning the load factor of a container object. This is sometimes too rigid:
 800     <ol>
 801         <li>In some cases it is desirable to allow controlled override of an entire policy, <i>e.g.</i> by externally resizing a container object (or giving it an initial size, which is a special case of externally resizing the container).
 802         </li>
 803         <li>
 804             In other cases it is desirable to allow changing the specifics of a policy in runtime, <i>e.g.</i>, changing the load factors of a load-check policy.
 805         </li>
 806     </ol>
 807     </li>
 808     <li>
 809     Resize policies interact strongly with hash policies. Performance-wise, for example, it is undesirable to use an exponential size policy of powers of two with a modulo range-hashing function, and it is undesirable to use a prime size policy with a mask range-hashing function. In other cases, the effects are more dramatic. For example, using a quadratic probe function with an exponential size policy will probably cause cases where the container object has available entries which are never reached by the probe function. (<a href = "hash_policies.html">Hash Policies</a>
 810 discusses the previous concepts.)
 811     </li>
 812 </ol>
 813
 814 <p>
 815     Clearly, the more of these points an interface addresses, the greater its flexibility but the lower its encapsulation and uniformity between associative containers.
 816 </p>
 817
 818
 819
 820 <p>
 821         This library attempts to address these types of problems by delegating all size-related functionality to
 822 policy classes. Hash-based containers
 823 are parameterized by a resize-policy class (among others), and derive publicly from
 824 the resize-policy class
 825 [<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>]
 826  <i>E.g.</i>, a collision-chaining
 827 hash table is defined as follows:
 828 </p>
 829 <pre>
 830 cc_ht_map&lt;
 831   <b>class</b> Key,
 832   <b>class</b> Data,
 833   ...
 834   <b>class</b> Resize_Policy
 835   ...&gt; :
 836     <b>public</b> Resize_Policy
 837 </pre>
 838
 839 <p>
 840     The containers themselves lack any functionality or public interface for manipulating sizes. A container
 841 object merely forwards events to its resize policy object and queries it for needed actions.
 842 </p>
 843
 844 <p>
 845     Figure
 846 <a href = "#insert_resize_sequence_diagram1">
 847 Insert resize sequence diagram
 848 </a>
 849 shows a (possible) sequence diagram of an insert operation.
 850 The user inserts an element; the hash table
 851 notifies its resize policy that a search has started (point A);
 852 in this case, a single collision is encountered - the table
 853 notifies its resize policy of this (point B); the container
 854 finally notifies its resize policy that the search has ended (point C);
 855 it then queries its resize policy whether a resize is needed, and if so,
 856 what is the new size (points D to G); following the resize, it notifies
 857 the policy that a resize has completed (point H); finally, the element
 858 is inserted, and the policy notified (point I).
 859 </p>
 860
 861 <h6 align = "center">
 862 <a name = "insert_resize_sequence_diagram1">
 863 <img src = "insert_resize_sequence_diagram1.jpg" width = "70%" alt = "no image">
 864 </a>
 865 </h6>
 866 <h6 align = "center">
 867 Insert resize sequence diagram.
 868 </h6>
 869
 870 <p>
 871     This addresses, to some extent, the problems mentioned above:
 872 </p>
 873 <ol>
 874     <li>
 875         Different instantiations of range-hashing policies can be met with different instantiations of
 876     resize policies.
 877     </li>
 878     <li>
 879         Questions on size-related interface are avoided, since the containers have no size-related methods. Thus
 880     a container has no method for querying its actual size. It merely continuously forwards enough information to
 881     its resize policy to answer such queries; the designer of the resize policy can decide whether, or how, to design     the appropriate method. Also, a container has no methods for setting its size. It merely queries its
 882 resize policy for an initial size, queries it on a new size (if the resize policy indicates a resize is needed), and
 883 supports a <tt><b>protected virtual</b></tt> function for external resize.
 884     </li>
 885 </ol>
 886
 887 <p>
 888     The library contains a single class for instantiating a resize policy,
 889 <tt>pb_assoc</tt> contains
 890 a standard resize policy,
 891 <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> (the name is explained shortly).
 892 In terms of interface, it is parameterized by a boolean constant indicating whether its public interface supports
 893 queries of actual size and external resize operations (the inclusion and exclusion of these methods in the interface have obvious tradeoffs in terms of encapsulation and flexibility).
 894 ([<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>] shows many techniques for
 895 changing between alternative interfaces at compile time.)
 896 </p>
 897
 898 <p>
 899 As noted before,
 900     size and trigger policies are usually orthogonal.
 901 <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
 902 is parameterized by size and trigger policies. For example,
 903 a collision-chaining hash table
 904 is typically be defined as follows:
 905 </p>
 906 <pre>
 907 cc_ht_map&lt;
 908   key,
 909   data,
 910   ...
 911   hash_standard_resize_policy&lt;
 912     some_trigger_policy,
 913     some_size_policy,
 914     ...&gt; &gt;
 915 </pre>
 916
 917 <p>
 918  The sole function of
 919 <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>
 920  is to
 921 act as a standard delegator
 922 [<a href = "references.html#gamma95designpatterns">gamma95designpatterns</a>] for these
 923 policies.
 924
 925 <p>
 926     Figures
 927 <a href = "#insert_resize_sequence_diagram2">Standard resize policy trigger sequence diagram</a>
 928     and
 929 <a href = "#insert_resize_sequence_diagram3">Standard resize policy size sequence diagram</a>
 930     show sequence diagrams illustrating the interaction between
 931     the standard resize policy and its trigger and size policies, respectively.
 932 </p>
 933
 934 <h6 align = "center">
 935 <a name = "insert_resize_sequence_diagram2">
 936 <img src = "insert_resize_sequence_diagram2.jpg" width = "70%" alt = "no image">
 937 </a>
 938 </h6>
 939 <h6 align = "center">
 940 Standard resize policy trigger sequence diagram.
 941 </h6>
 942
 943 <h6 align = "center">
 944 <a name = "insert_resize_sequence_diagram3">
 945 <img src = "insert_resize_sequence_diagram3.jpg" width = "70%" alt = "no image">
 946 </a>
 947 </h6>
 948 <h6 align = "center">
 949 Standard resize policy size sequence diagram.
 950 </h6>
 951
 952 <p>
 953     The library (currently) supports the following instantiations of size
 954 and trigger policies:
 955 </p>
 956
 957 <ol>
 958     <li>
 959         <a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a> implements
 960     a load check trigger policy.
 961     </li>
 962     <li>
 963         <a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>
 964     implements a collision check trigger policy.
 965     </li>
 966     <li>
 967 <a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a> implemens
 968 an exponential-size policy (which should be used with mask range hashing).
 969     </li>
 970     <li>
 971 <a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a> implementing
 972 a size policy based on a sequence of primes
 973 [<a href = "references.html#sgi_stl">sgi_stl</a>] (which should be used with mod range hashing
 974     </li>
 975 </ol>
 976
 977 <p>
 978     The trigger policies also support interfaces for changing their specifics which depend on compile time constants.
 979 </p>
 980
 981
 982 <p>
 983     Figure
 984 <a href = "#resize_policy_cd">Resize policy class diagram</a> gives an overall picture
 985 of the resize-related classes.
 986 <tt>Container</tt> (which stands for any of the hash-based containers) is parameterized
 987 by <tt>Resize_Policy</tt>, from which it subclasses publicly
 988 [<a href = "references.html#alexandrescu01modern">alexandrescu01modern</a>].
 989 This class is currently instantiated only by
 990 <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a>.
 991 <a href = "hash_standard_resize_policy.html"><tt>hash_standard_resize_policy</tt></a> itself
 992 is parameterized by <tt>Trigger_Policy</tt> and <tt>Size_Policy</tt>.
 993 Currently, <tt>Trigger_Policy</tt> is instantiated by
 994 <a href = "hash_load_check_resize_trigger.html"><tt>hash_load_check_resize_trigger</tt></a>,
 995 or
 996 <a href = "cc_hash_max_collision_check_resize_trigger.html"><tt>cc_hash_max_collision_check_resize_trigger</tt></a>; <tt>Size_Policy</tt> is instantiated by
 997 <a href = "hash_exponential_size_policy.html"><tt>hash_exponential_size_policy</tt></a>,
 998 or
 999 <a href = "hash_prime_size_policy.html"><tt>hash_prime_size_policy</tt></a>.
1000 </p>
1001
1002
1003 <h6 align = "center">
1004 <a name = "resize_policy_cd">
1005 <img src = "resize_policy_cd.jpg" width = "70%" alt = "no image">
1006 </a>
1007 </h6>
1008 <h6 align = "center">
1009 Resize policy class diagram.
1010 </h6>
1011
1012
1013
1014
1015 <h2><a name = "#policy_interaction">Policy Interaction</a></h2>
1016
1017 <p>
1018         Hash-tables are unfortunately susceptible to choice of policies. One
1019 of the more complicated aspects of this is that poor combinations of good policies
1020 can alter performance drastically. Following are some considerations.
1021 </p>
1022
1023
1024
1025
1026
1027 <h3>Range-Hashing Policies and Resize Policies</h3>
1028
1029 <p>
1030 </p>
1031
1032
1033 <h3>Equivalence Functors, Storing Hash Values, and Hash Functions</h3>
1034
1035
1036 <p>
1037 <a href = "cc_hash_assoc_cntnr.html"><tt>cc_hash_assoc_cntnr</tt></a>
1038 and
1039 <a href = "gp_hash_assoc_cntnr.html"><tt>gp_hash_assoc_cntnr</tt></a>
1040 are parameterized by an equivalenc functor and by a <tt>Store_Hash</tt>
1041 parameter. If the latter parameter is <tt><b>true</b></tt>, then
1042 the container stores with each entry a hash value, and uses
1043 this value in case of collisions to determine whether to apply a hash value.
1044 This can lower the cost of collision for some types, but increase the cost of collisions for other types.
1045 </p>
1046
1047 <p>
1048         If a ranged-hash function or ranged probe function is directly supplied, however,
1049 then it makes no sense to store the hash value with each entry. <tt>pb_assoc</tt>'s container will fail at compilation, by design, if this is attempted.
1050 </p>
1051
1052
1053
1054 </body>
1055
1056 </html>