libstdc++-v3/docs/html/ext/mt_allocator.html

   1 <?xml version="1.0" encoding="ISO-8859-1"?>
   2 <!DOCTYPE html
   3           PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   4           "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   5
   6 <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
   7 <head>
   8    <meta name="AUTHOR" content="Stefan Olsson &lt;stefan@xapa.se&gt;" />
   9    <meta name="KEYWORDS" content="c++, libstdc++, g++, allocator, memory" />
  10    <meta name="DESCRIPTION" content="Allocators and allocation" />
  11    <meta name="GENERATOR" content="emacs and ten fingers" />
  12    <title>A fixed-size, multi-thread optimized allocator</title>
  13 <link rel="StyleSheet" href="../lib3styles.css" type="text/css" />
  14 <link rel="Start" href="../documentation.html" type="text/html"
  15   title="GNU C++ Standard Library" />
  16 <link rel="Bookmark" href="howto.html" type="text/html" title="Extensions" />
  17 <link rel="Copyright" href="../17_intro/license.html" type="text/html" />
  18 </head>
  19 <body>
  20
  21 <h1 class="centered"><a name="top">A fixed-size, multi-thread optimized allocator</a></h1>
  22
  23 <p class="fineprint"><em>
  24    The latest version of this document is always available at
  25    <a href="http://gcc.gnu.org/onlinedocs/libstdc++/ext/mt_allocator.html">
  26    http://gcc.gnu.org/onlinedocs/libstdc++/ext/mt_allocator.html</a>.
  27 </em></p>
  28
  29 <p><em>
  30    To the <a href="http://gcc.gnu.org/libstdc++/">libstdc++-v3 homepage</a>.
  31 </em></p>
  32
  33 <!-- ####################################################### -->
  34 <hr />
  35 <h3 class="left">
  36   <a name="intro">Introduction</a>
  37 </h3>
  38
  39 <p> The mt allocator [hereinafter referred to simply as "the
  40 allocator"] is a fixed size (power of two) allocator that was
  41 initially developed specifically to suit the needs of multi threaded
  42 applications [hereinafter referred to as an MT application]. Over time
  43 the allocator has evolved and been improved in many ways, one of the
  44 being that it now also does a good job in single threaded applications
  45 [hereinafter referred to as a ST application].  (Note: In this
  46 document, when referring to single threaded applications this also
  47 includes applications that are compiled with gcc without thread
  48 support enabled. This is accomplished using ifdef's on __GTHREADS)
  49 </p>
  50
  51 <p>
  52 The aim of this document is to describe - from a application point of
  53 view - the "inner workings" of the allocator.
  54 </p>
  55
  56 <h3 class="left">
  57   <a name="init">Tunable parameters</a>
  58 </h3>
  59
  60 <p>Certain allocation parameters can be modified on a per-type
  61 basis. There exists a nested <pre>struct _Tune</pre> that contains all
  62 these parameters, which include settings for
  63 </p>
  64    <ul>
  65      <li>Alignment </li>
  66      <li>Maximum bytes before calling <code>::operator new</code> directly</li>
  67      <li>Minimum bytes</li>
  68      <li>Size of underlying global allocations</li>
  69      <li>Maximum number of supported threads</li>
  70      <li>Migration of deallocations to the global free list</li>
  71      <li>Shunt for global <code>new</code> and <code>delete</code></li>
  72    </ul>
  73 <p>Adjusting parameters for a given instance of an allocator can only
  74 happen before any allocations take place, when the allocator itself is
  75 initialized. For instance:
  76 </p>
  77 <pre>
  78 #include &lt;ext/mt_allocator.h&gt;
  79
  80 struct pod
  81 {
  82   int i;
  83   int j;
  84 };
  85
  86 int main()
  87 {
  88   typedef pod value_type;
  89   typedef __gnu_cxx::__mt_alloc&lt;value_type&gt; allocator_type;
  90   typedef allocator_type::_Tune tune_type;
  91
  92   tune_type t_default;
  93   tune_type t_opt(16, 5120, 32, 5120, 20, 10, false);
  94   tune_type t_single(16, 5120, 32, 5120, 1, 10, false);
  95
  96   tune_type t;
  97   t = allocator_type::_S_get_options();
  98   allocator_type::_S_set_options(t_opt);
  99   t = allocator_type::_S_get_options();
 100
 101   allocator_type a;
 102   allocator_type::pointer p1 = a.allocate(128);
 103   allocator_type::pointer p2 = a.allocate(5128);
 104
 105   a.deallocate(p1, 128);
 106   a.deallocate(p2, 5128);
 107
 108   return 0;
 109 }
 110 </pre>
 111
 112 <h3 class="left">
 113   <a name="init">Initialization</a>
 114 </h3>
 115
 116 <p>
 117 The static variables (pointers to freelists, tuning parameters etc)
 118 are initialized as above, or are set to the global defaults.
 119 </p>
 120
 121 <p>
 122 The very first allocate() call will always call the _S_init() function.
 123 In order to make sure that this function is called exactly once we make use
 124 of a __gthread_once (with _S_once_mt and _S_init as arguments) call in MT
 125 applications and check a static bool (_S_initialized) in ST applications.
 126 </p>
 127
 128 <p>
 129 The _S_init() function:
 130 - If the GLIBCXX_FORCE_NEW environment variable is set, it sets the bool
 131   _S_force_new to true and then returns. This will cause subsequent calls to
 132   allocate() to return memory directly from a new() call, and deallocate will
 133   only do a delete() call.
 134 </p>
 135
 136 <p>
 137 - If the GLIBCXX_FORCE_NEW environment variable is not set, both ST and MT
 138   applications will:
 139   - Calculate the number of bins needed. A bin is a specific power of two size
 140     of bytes. I.e., by default the allocator will deal with requests of up to
 141     128 bytes (or whatever the value of _S_max_bytes is when _S_init() is
 142     called). This means that there will be bins of the following sizes
 143     (in bytes): 1, 2, 4, 8, 16, 32, 64, 128.
 144
 145   - Create the _S_binmap array. All requests are rounded up to the next
 146     "large enough" bin. I.e., a request for 29 bytes will cause a block from
 147     the "32 byte bin" to be returned to the application. The purpose of
 148     _S_binmap is to speed up the process of finding out which bin to use.
 149     I.e., the value of _S_binmap[ 29 ] is initialized to 5 (bin 5 = 32 bytes).
 150 </p>
 151 <p>
 152   - Create the _S_bin array. This array consists of bin_records. There will be
 153     as many bin_records in this array as the number of bins that we calculated
 154     earlier. I.e., if _S_max_bytes = 128 there will be 8 entries.
 155     Each bin_record is then initialized:
 156     - bin_record-&gt;first = An array of pointers to block_records. There will be
 157       as many block_records pointers as there are maximum number of threads
 158       (in a ST application there is only 1 thread, in a MT application there
 159       are _S_max_threads).
 160       This holds the pointer to the first free block for each thread in this
 161       bin. I.e., if we would like to know where the first free block of size 32
 162       for thread number 3 is we would look this up by: _S_bin[ 5 ].first[ 3 ]
 163
 164     The above created block_record pointers members are now initialized to
 165     their initial values. I.e. _S_bin[ n ].first[ n ] = NULL;
 166 </p>
 167
 168 <p>
 169 - Additionally a MT application will:
 170   - Create a list of free thread id's. The pointer to the first entry
 171     is stored in _S_thread_freelist_first. The reason for this approach is
 172     that the __gthread_self() call will not return a value that corresponds to
 173     the maximum number of threads allowed but rather a process id number or
 174     something else. So what we do is that we create a list of thread_records.
 175     This list is _S_max_threads long and each entry holds a size_t thread_id
 176     which is initialized to 1, 2, 3, 4, 5 and so on up to _S_max_threads.
 177     Each time a thread calls allocate() or deallocate() we call
 178     _S_get_thread_id() which looks at the value of _S_thread_key which is a
 179     thread local storage pointer. If this is NULL we know that this is a newly
 180     created thread and we pop the first entry from this list and saves the
 181     pointer to this record in the _S_thread_key variable. The next time
 182     we will get the pointer to the thread_record back and we use the
 183     thread_record-&gt;thread_id as identification. I.e., the first thread that
 184     calls allocate will get the first record in this list and thus be thread
 185     number 1 and will then find the pointer to its first free 32 byte block
 186     in _S_bin[ 5 ].first[ 1 ]
 187     When we create the _S_thread_key we also define a destructor
 188     (_S_thread_key_destr) which means that when the thread dies, this
 189     thread_record is returned to the front of this list and the thread id
 190     can then be reused if a new thread is created.
 191     This list is protected by a mutex (_S_thread_freelist_mutex) which is only
 192     locked when records are removed/added to the list.
 193 </p>
 194 <p>
 195   - Initialize the free and used counters of each bin_record:
 196     - bin_record-&gt;free = An array of size_t. This keeps track of the number
 197       of blocks on a specific thread's freelist in each bin. I.e., if a thread
 198       has 12 32-byte blocks on it's freelists and allocates one of these, this
 199       counter would be decreased to 11.
 200
 201     - bin_record-&gt;used = An array of size_t. This keeps track of the number
 202       of blocks currently in use of this size by this thread. I.e., if a thread
 203       has made 678 requests (and no deallocations...) of 32-byte blocks this
 204       counter will read 678.
 205
 206     The above created arrays are now initialized with their initial values.
 207     I.e. _S_bin[ n ].free[ n ] = 0;
 208 </p>
 209 <p>
 210   - Initialize the mutex of each bin_record:
 211     The bin_record-&gt;mutex is used to protect the global freelist. This concept
 212     of a global freelist is explained in more detail in the section
 213     "A multi threaded example", but basically this mutex is locked whenever
 214     a block of memory is retrieved or returned to the global freelist for this
 215     specific bin. This only occurs when a number of blocks are grabbed from the
 216     global list to a thread specific list or when a thread decides to return
 217     some blocks to the global freelist.
 218 </p>
 219
 220 <h3 class="left">
 221   <a name="st_example">A single threaded example (and a primer for the multi threaded example!)</a>
 222 </h3>
 223
 224 <p>
 225 Let's start by describing how the data on a freelist is laid out in memory.
 226 This is the first two blocks in freelist for thread id 3 in bin 3 (8 bytes):
 227 </p>
 228 <pre>
 229 +----------------+
 230 | next* ---------|--+  (_S_bin[ 3 ].first[ 3 ] points here)
 231 |                |  |
 232 |                |  |
 233 |                |  |
 234 +----------------+  |
 235 | thread_id = 3  |  |
 236 |                |  |
 237 |                |  |
 238 |                |  |
 239 +----------------+  |
 240 | DATA           |  |  (A pointer to here is what is returned to the
 241 |                |  |   the application when needed)
 242 |                |  |
 243 |                |  |
 244 |                |  |
 245 |                |  |
 246 |                |  |
 247 |                |  |
 248 +----------------+  |
 249 +----------------+  |
 250 | next*          |&lt;-+  (If next == NULL it's the last one on the list)
 251 |                |
 252 |                |
 253 |                |
 254 +----------------+
 255 | thread_id = 3  |
 256 |                |
 257 |                |
 258 |                |
 259 +----------------+
 260 | DATA           |
 261 |                |
 262 |                |
 263 |                |
 264 |                |
 265 |                |
 266 |                |
 267 |                |
 268 +----------------+
 269 </pre>
 270
 271 <p>
 272 With this in mind we simplify things a bit for a while and say that there is
 273 only one thread (a ST application). In this case all operations are made to
 274 what is referred to as the global pool - thread id 0 (No thread may be
 275 assigned this id since they span from 1 to _S_max_threads in a MT application).
 276 </p>
 277 <p>
 278 When the application requests memory (calling allocate()) we first look at the
 279 requested size and if this is &gt; _S_max_bytes we call new() directly and return.
 280 </p>
 281 <p>
 282 If the requested size is within limits we start by finding out from which
 283 bin we should serve this request by looking in _S_binmap.
 284 </p>
 285 <p>
 286 A quick look at _S_bin[ bin ].first[ 0 ] tells us if there are any blocks of
 287 this size on the freelist (0). If this is not NULL - fine, just remove the
 288 block that _S_bin[ bin ].first[ 0 ] points to from the list,
 289 update _S_bin[ bin ].first[ 0 ] and return a pointer to that blocks data.
 290 </p>
 291 <p>
 292 If the freelist is empty (the pointer is NULL) we must get memory from the
 293 system and build us a freelist within this memory. All requests for new memory
 294 is made in chunks of _S_chunk_size. Knowing the size of a block_record and
 295 the bytes that this bin stores we then calculate how many blocks we can create
 296 within this chunk, build the list, remove the first block, update the pointer
 297 (_S_bin[ bin ].first[ 0 ]) and return a pointer to that blocks data.
 298 </p>
 299
 300 <p>
 301 Deallocation is equally simple; the pointer is casted back to a block_record
 302 pointer, lookup which bin to use based on the size, add the block to the front
 303 of the global freelist and update the pointer as needed
 304 (_S_bin[ bin ].first[ 0 ]).
 305 </p>
 306
 307 <p>
 308 The decision to add deallocated blocks to the front of the freelist was made
 309 after a set of performance measurements that showed that this is roughly 10%
 310 faster than maintaining a set of "last pointers" as well.
 311 </p>
 312
 313 <h3 class="left">
 314   <a name="mt_example">A multi threaded example</a>
 315 </h3>
 316
 317 <p>
 318 In the ST example we never used the thread_id variable present in each block.
 319 Let's start by explaining the purpose of this in a MT application.
 320 </p>
 321
 322 <p>
 323 The concept of "ownership" was introduced since many MT applications
 324 allocate and deallocate memory to shared containers from different
 325 threads (such as a cache shared amongst all threads). This introduces
 326 a problem if the allocator only returns memory to the current threads
 327 freelist (I.e., there might be one thread doing all the allocation and
 328 thus obtaining ever more memory from the system and another thread
 329 that is getting a longer and longer freelist - this will in the end
 330 consume all available memory).
 331 </p>
 332
 333 <p>
 334 Each time a block is moved from the global list (where ownership is
 335 irrelevant), to a threads freelist (or when a new freelist is built
 336 from a chunk directly onto a threads freelist or when a deallocation
 337 occurs on a block which was not allocated by the same thread id as the
 338 one doing the deallocation) the thread id is set to the current one.
 339 </p>
 340
 341 <p>
 342 What's the use? Well, when a deallocation occurs we can now look at
 343 the thread id and find out if it was allocated by another thread id
 344 and decrease the used counter of that thread instead, thus keeping the
 345 free and used counters correct. And keeping the free and used counters
 346 corrects is very important since the relationship between these two
 347 variables decides if memory should be returned to the global pool or
 348 not when a deallocation occurs.
 349 </p>
 350
 351 <p>
 352 When the application requests memory (calling allocate()) we first
 353 look at the requested size and if this is &gt; _S_max_bytes we call new()
 354 directly and return.
 355 </p>
 356
 357 <p>
 358 If the requested size is within limits we start by finding out from which
 359 bin we should serve this request by looking in _S_binmap.
 360 </p>
 361
 362 <p>
 363 A call to _S_get_thread_id() returns the thread id for the calling thread
 364 (and if no value has been set in _S_thread_key, a new id is assigned and
 365 returned).
 366 </p>
 367
 368 <p>
 369 A quick look at _S_bin[ bin ].first[ thread_id ] tells us if there are
 370 any blocks of this size on the current threads freelist. If this is
 371 not NULL - fine, just remove the block that _S_bin[ bin ].first[
 372 thread_id ] points to from the list, update _S_bin[ bin ].first[
 373 thread_id ], update the free and used counters and return a pointer to
 374 that blocks data.
 375 </p>
 376
 377 <p>
 378 If the freelist is empty (the pointer is NULL) we start by looking at
 379 the global freelist (0). If there are blocks available on the global
 380 freelist we lock this bins mutex and move up to block_count (the
 381 number of blocks of this bins size that will fit into a _S_chunk_size)
 382 or until end of list - whatever comes first - to the current threads
 383 freelist and at the same time change the thread_id ownership and
 384 update the counters and pointers. When the bins mutex has been
 385 unlocked, we remove the block that _S_bin[ bin ].first[ thread_id ]
 386 points to from the list, update _S_bin[ bin ].first[ thread_id ],
 387 update the free and used counters, and return a pointer to that blocks
 388 data.
 389 </p>
 390
 391 <p>
 392 The reason that the number of blocks moved to the current threads
 393 freelist is limited to block_count is to minimize the chance that a
 394 subsequent deallocate() call will return the excess blocks to the
 395 global freelist (based on the _S_freelist_headroom calculation, see
 396 below).
 397 </p>
 398
 399 <p>
 400 However if there isn't any memory on the global pool we need to get
 401 memory from the system - this is done in exactly the same way as in a
 402 single threaded application with one major difference; the list built
 403 in the newly allocated memory (of _S_chunk_size size) is added to the
 404 current threads freelist instead of to the global.
 405 </p>
 406
 407 <p>
 408 The basic process of a deallocation call is simple: always add the
 409 block to the front of the current threads freelist and update the
 410 counters and pointers (as described earlier with the specific check of
 411 ownership that causes the used counter of the thread that originally
 412 allocated the block to be decreased instead of the current threads
 413 counter).
 414 </p>
 415
 416 <p>
 417 And here comes the free and used counters to service. Each time a
 418 deallocation() call is made, the length of the current threads
 419 freelist is compared to the amount memory in use by this thread.
 420 </p>
 421
 422 <p>
 423 Let's go back to the example of an application that has one thread
 424 that does all the allocations and one that deallocates. Both these
 425 threads use say 516 32-byte blocks that was allocated during thread
 426 creation for example.  Their used counters will both say 516 at this
 427 point. The allocation thread now grabs 1000 32-byte blocks and puts
 428 them in a shared container. The used counter for this thread is now
 429 1516.
 430 </p>
 431
 432 <p>
 433 The deallocation thread now deallocates 500 of these blocks. For each
 434 deallocation made the used counter of the allocating thread is
 435 decreased and the freelist of the deallocation thread gets longer and
 436 longer. But the calculation made in deallocate() will limit the length
 437 of the freelist in the deallocation thread to _S_freelist_headroom %
 438 of it's used counter.  In this case, when the freelist (given that the
 439 _S_freelist_headroom is at it's default value of 10%) exceeds 52
 440 (516/10) blocks will be returned to the global pool where the
 441 allocating thread may pick them up and reuse them.
 442 </p>
 443
 444 <p>
 445 In order to reduce lock contention (since this requires this bins
 446 mutex to be locked) this operation is also made in chunks of blocks
 447 (just like when chunks of blocks are moved from the global freelist to
 448 a threads freelist mentioned above). The "formula" used can probably
 449 be improved to further reduce the risk of blocks being "bounced back
 450 and forth" between freelists.
 451 </p>
 452
 453 <hr />
 454 <p>Return <a href="#top">to the top of the page</a> or
 455    <a href="http://gcc.gnu.org/libstdc++/">to the libstdc++ homepage</a>.
 456 </p>
 457
 458
 459 <!-- ####################################################### -->
 460
 461 <hr />
 462 <p class="fineprint"><em>
 463 See <a href="../17_intro/license.html">license.html</a> for copying conditions.
 464 Comments and suggestions are welcome, and may be sent to
 465 <a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.
 466 </em></p>
 467
 468
 469 </body>
 470 </html>