1 <?xml version="1.0" encoding="ISO-8859-1"?>
3 PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
4 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
7 <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
8 <meta name="AUTHOR" content="pme@gcc.gnu.org (Phil Edwards)" />
9 <meta name="KEYWORDS" content="HOWTO, libstdc++, GCC, g++, libg++, STL" />
10 <meta name="DESCRIPTION" content="HOWTO for the libstdc++ chapter 27." />
11 <meta name="GENERATOR" content="vi and eight fingers" />
12 <title>libstdc++-v3 HOWTO: Chapter 27</title>
13 <link rel="StyleSheet" href="../lib3styles.css" />
17 <h1 class="centered"><a name="top">Chapter 27: Input/Output</a></h1>
19 <p>Chapter 27 deals with iostreams and all their subcomponents
20 and extensions. All <em>kinds</em> of fun stuff.
24 <!-- ####################################################### -->
28 <li><a href="#1">Copying a file</a></li>
29 <li><a href="#2">The buffering is screwing up my program!</a></li>
30 <li><a href="#3">Binary I/O</a></li>
31 <li><a href="#5">What is this <sstream>/stringstreams thing?</a></li>
32 <li><a href="#6">Deriving a stream buffer</a></li>
33 <li><a href="#7">More on binary I/O</a></li>
34 <li><a href="#8">Pathetic performance? Ditch C.</a></li>
35 <li><a href="#9">Threads and I/O</a></li>
40 <!-- ####################################################### -->
42 <h2><a name="1">Copying a file</a></h2>
43 <p>So you want to copy a file quickly and easily, and most important,
44 completely portably. And since this is C++, you have an open
45 ifstream (call it IN) and an open ofstream (call it OUT):
48 #include <fstream>
50 std::ifstream IN ("input_file");
51 std::ofstream OUT ("output_file"); </pre>
52 <p>Here's the easiest way to get it completely wrong:
55 OUT << IN;</pre>
56 <p>For those of you who don't already know why this doesn't work
57 (probably from having done it before), I invite you to quickly
58 create a simple text file called "input_file" containing
62 The quick brown fox jumped over the lazy dog.</pre>
63 <p>surrounded by blank lines. Code it up and try it. The contents
64 of "output_file" may surprise you.
66 <p>Seriously, go do it. Get surprised, then come back. It's worth it.
69 <p>The thing to remember is that the <code>basic_[io]stream</code> classes
70 handle formatting, nothing else. In particular, they break up on
71 whitespace. The actual reading, writing, and storing of data is
72 handled by the <code>basic_streambuf</code> family. Fortunately, the
73 <code>operator<<</code> is overloaded to take an ostream and
74 a pointer-to-streambuf, in order to help with just this kind of
75 "dump the data verbatim" situation.
77 <p>Why a <em>pointer</em> to streambuf and not just a streambuf? Well,
78 the [io]streams hold pointers (or references, depending on the
79 implementation) to their buffers, not the actual
80 buffers. This allows polymorphic behavior on the part of the buffers
81 as well as the streams themselves. The pointer is easily retrieved
82 using the <code>rdbuf()</code> member function. Therefore, the easiest
83 way to copy the file is:
86 OUT << IN.rdbuf();</pre>
87 <p>So what <em>was</em> happening with OUT<<IN? Undefined
88 behavior, since that particular << isn't defined by the Standard.
89 I have seen instances where it is implemented, but the character
90 extraction process removes all the whitespace, leaving you with no
91 blank lines and only "Thequickbrownfox...". With
92 libraries that do not define that operator, IN (or one of IN's
93 member pointers) sometimes gets converted to a void*, and the output
94 file then contains a perfect text representation of a hexidecimal
95 address (quite a big surprise). Others don't compile at all.
97 <p>Also note that none of this is specific to o<b>*f*</b>streams.
98 The operators shown above are all defined in the parent
99 basic_ostream class and are therefore available with all possible
102 <p>Return <a href="#top">to top of page</a> or
103 <a href="../faq/index.html">to the FAQ</a>.
107 <h2><a name="2">The buffering is screwing up my program!</a></h2>
109 This is not written very well. I need to redo this section.
111 <p>First, are you sure that you understand buffering? Particularly
112 the fact that C++ may not, in fact, have anything to do with it?
114 <p>The rules for buffering can be a little odd, but they aren't any
115 different from those of C. (Maybe that's why they can be a bit
116 odd.) Many people think that writing a newline to an output
117 stream automatically flushes the output buffer. This is true only
118 when the output stream is, in fact, a terminal and not a file
119 or some other device -- and <em>that</em> may not even be true
120 since C++ says nothing about files nor terminals. All of that is
121 system-dependent. (The "newline-buffer-flushing only occurring
122 on terminals" thing is mostly true on Unix systems, though.)
124 <p>Some people also believe that sending <code>endl</code> down an
125 output stream only writes a newline. This is incorrect; after a
126 newline is written, the buffer is also flushed. Perhaps this
127 is the effect you want when writing to a screen -- get the text
128 out as soon as possible, etc -- but the buffering is largely
129 wasted when doing this to a file:
132 output << "a line of text" << endl;
133 output << some_data_variable << endl;
134 output << "another line of text" << endl; </pre>
135 <p>The proper thing to do in this case to just write the data out
136 and let the libraries and the system worry about the buffering.
137 If you need a newline, just write a newline:
140 output << "a line of text\n"
141 << some_data_variable << '\n'
142 << "another line of text\n"; </pre>
143 <p>I have also joined the output statements into a single statement.
144 You could make the code prettier by moving the single newline to
145 the start of the quoted text on the thing line, for example.
147 <p>If you do need to flush the buffer above, you can send an
148 <code>endl</code> if you also need a newline, or just flush the buffer
152 output << ...... << flush; // can use std::flush manipulator
153 output.flush(); // or call a member fn </pre>
154 <p>On the other hand, there are times when writing to a file should
155 be like writing to standard error; no buffering should be done
156 because the data needs to appear quickly (a prime example is a
157 log file for security-related information). The way to do this is
158 just to turn off the buffering <em>before any I/O operations at
159 all</em> have been done, i.e., as soon as possible after opening:
162 std::ofstream os ("/foo/bar/baz");
163 std::ifstream is ("/qux/quux/quuux");
166 os.rdbuf()->pubsetbuf(0,0);
167 is.rdbuf()->pubsetbuf(0,0);
169 os << "this data is written immediately\n";
170 is >> i; // and this will probably cause a disk read </pre>
171 <p>Since all aspects of buffering are handled by a streambuf-derived
172 member, it is necessary to get at that member with <code>rdbuf()</code>.
173 Then the public version of <code>setbuf</code> can be called. The
174 arguments are the same as those for the Standard C I/O Library
175 function (a buffer area followed by its size).
177 <p>A great deal of this is implementation-dependent. For example,
178 <code>streambuf</code> does not specify any actions for its own
179 <code>setbuf()</code>-ish functions; the classes derived from
180 <code>streambuf</code> each define behavior that "makes
181 sense" for that class: an argument of (0,0) turns off buffering
182 for <code>filebuf</code> but has undefined behavior for its sibling
183 <code>stringbuf</code>, and specifying anything other than (0,0) has
184 varying effects. Other user-defined class derived from streambuf can
185 do whatever they want. (For <code>filebuf</code> and arguments for
186 <code>(p,s)</code> other than zeros, libstdc++ does what you'd expect:
187 the first <code>s</code> bytes of <code>p</code> are used as a buffer,
188 which you must allocate and deallocate.)
190 <p>A last reminder: there are usually more buffers involved than
191 just those at the language/library level. Kernel buffers, disk
192 buffers, and the like will also have an effect. Inspecting and
193 changing those are system-dependent.
195 <p>Return <a href="#top">to top of page</a> or
196 <a href="../faq/index.html">to the FAQ</a>.
200 <h2><a name="3">Binary I/O</a></h2>
201 <p>The first and most important thing to remember about binary I/O is
202 that opening a file with <code>ios::binary</code> is not, repeat
203 <em>not</em>, the only thing you have to do. It is not a silver
204 bullet, and will not allow you to use the <code><</>></code>
205 operators of the normal fstreams to do binary I/O.
207 <p>Sorry. Them's the breaks.
209 <p>This isn't going to try and be a complete tutorial on reading and
210 writing binary files (because "binary"
211 <a href="#7">covers a lot of ground)</a>, but we will try and clear
212 up a couple of misconceptions and common errors.
214 <p>First, <code>ios::binary</code> has exactly one defined effect, no more
215 and no less. Normal text mode has to be concerned with the newline
216 characters, and the runtime system will translate between (for
217 example) '\n' and the appropriate end-of-line sequence (LF on Unix,
218 CRLF on DOS, CR on Macintosh, etc). (There are other things that
219 normal mode does, but that's the most obvious.) Opening a file in
220 binary mode disables this conversion, so reading a CRLF sequence
221 under Windows won't accidentally get mapped to a '\n' character, etc.
222 Binary mode is not supposed to suddenly give you a bitstream, and
223 if it is doing so in your program then you've discovered a bug in
224 your vendor's compiler (or some other part of the C++ implementation,
225 possibly the runtime system).
227 <p>Second, using <code><<</code> to write and <code>>></code> to
228 read isn't going to work with the standard file stream classes, even
229 if you use <code>skipws</code> during reading. Why not? Because
230 ifstream and ofstream exist for the purpose of <em>formatting</em>,
231 not reading and writing. Their job is to interpret the data into
232 text characters, and that's exactly what you don't want to happen
235 <p>Third, using the <code>get()</code> and <code>put()/write()</code> member
236 functions still aren't guaranteed to help you. These are
237 "unformatted" I/O functions, but still character-based.
238 (This may or may not be what you want, see below.)
240 <p>Notice how all the problems here are due to the inappropriate use
241 of <em>formatting</em> functions and classes to perform something
242 which <em>requires</em> that formatting not be done? There are a
243 seemingly infinite number of solutions, and a few are listed here:
246 <li>"Derive your own fstream-type classes and write your own
247 <</>> operators to do binary I/O on whatever data
248 types you're using." This is a Bad Thing, because while
249 the compiler would probably be just fine with it, other humans
250 are going to be confused. The overloaded bitshift operators
251 have a well-defined meaning (formatting), and this breaks it.
253 <li>"Build the file structure in memory, then <code>mmap()</code>
254 the file and copy the structure." Well, this is easy to
255 make work, and easy to break, and is pretty equivalent to
256 using <code>::read()</code> and <code>::write()</code> directly, and
257 makes no use of the iostream library at all...
259 <li>"Use streambufs, that's what they're there for."
260 While not trivial for the beginner, this is the best of all
261 solutions. The streambuf/filebuf layer is the layer that is
262 responsible for actual I/O. If you want to use the C++
263 library for binary I/O, this is where you start.
266 <p>How to go about using streambufs is a bit beyond the scope of this
267 document (at least for now), but while streambufs go a long way,
268 they still leave a couple of things up to you, the programmer.
269 As an example, byte ordering is completely between you and the
270 operating system, and you have to handle it yourself.
272 <p>Deriving a streambuf or filebuf
273 class from the standard ones, one that is specific to your data
274 types (or an abstraction thereof) is probably a good idea, and
275 lots of examples exist in journals and on Usenet. Using the
276 standard filebufs directly (either by declaring your own or by
277 using the pointer returned from an fstream's <code>rdbuf()</code>)
278 is certainly feasible as well.
280 <p>One area that causes problems is trying to do bit-by-bit operations
281 with filebufs. C++ is no different from C in this respect: I/O
282 must be done at the byte level. If you're trying to read or write
283 a few bits at a time, you're going about it the wrong way. You
284 must read/write an integral number of bytes and then process the
285 bytes. (For example, the streambuf functions take and return
286 variables of type <code>int_type</code>.)
288 <p>Another area of problems is opening text files in binary mode.
289 Generally, binary mode is intended for binary files, and opening
290 text files in binary mode means that you now have to deal with all of
291 those end-of-line and end-of-file problems that we mentioned before.
292 An instructive thread from comp.lang.c++.moderated delved off into
293 this topic starting more or less at
294 <a href="http://www.deja.com/getdoc.xp?AN=436187505">this</a>
295 article and continuing to the end of the thread. (You'll have to
296 sort through some flames every couple of paragraphs, but the points
301 <h2><a name="5">What is this <sstream>/stringstreams thing?</a></h2>
302 <p>Stringstreams (defined in the header <code><sstream></code>)
303 are in this author's opinion one of the coolest things since
304 sliced time. An example of their use is in the Received Wisdom
305 section for Chapter 21 (Strings),
306 <a href="../21_strings/howto.html#1.1internal"> describing how to
309 <p>The quick definition is: they are siblings of ifstream and ofstream,
310 and they do for <code>std::string</code> what their siblings do for
311 files. All that work you put into writing <code><<</code> and
312 <code>>></code> functions for your classes now pays off
313 <em>again!</em> Need to format a string before passing the string
314 to a function? Send your stuff via <code><<</code> to an
315 ostringstream. You've read a string as input and need to parse it?
316 Initialize an istringstream with that string, and then pull pieces
317 out of it with <code>>></code>. Have a stringstream and need to
318 get a copy of the string inside? Just call the <code>str()</code>
321 <p>This only works if you've written your
322 <code><<</code>/<code>>></code> functions correctly, though,
323 and correctly means that they take istreams and ostreams as
324 parameters, not i<b>f</b>streams and o<b>f</b>streams. If they
325 take the latter, then your I/O operators will work fine with
326 file streams, but with nothing else -- including stringstreams.
328 <p>If you are a user of the strstream classes, you need to update
329 your code. You don't have to explicitly append <code>ends</code> to
330 terminate the C-style character array, you don't have to mess with
331 "freezing" functions, and you don't have to manage the
332 memory yourself. The strstreams have been officially deprecated,
333 which means that 1) future revisions of the C++ Standard won't
334 support them, and 2) if you use them, people will laugh at you.
338 <h2><a name="6">Deriving a stream buffer</a></h2>
339 <p>Creating your own stream buffers for I/O can be remarkably easy.
340 If you are interested in doing so, we highly recommend two very
342 <a href="http://home.camelot.de/langer/iostreams.htm">Standard C++
343 IOStreams and Locales</a> by Langer and Kreft, ISBN 0-201-18395-1, and
344 <a href="http://www.josuttis.com/libbook/">The C++ Standard Library</a>
345 by Nicolai Josuttis, ISBN 0-201-37926-0. Both are published by
346 Addison-Wesley, who isn't paying us a cent for saying that, honest.
348 <p>Here is a simple example, io/outbuf1, from the Josuttis text. It
349 transforms everything sent through it to uppercase. This version
350 assumes many things about the nature of the character type being
351 used (for more information, read the books or the newsgroups):
354 #include <iostream>
355 #include <streambuf>
356 #include <locale>
357 #include <cstdio>
359 class outbuf : public std::streambuf
362 /* central output function
363 * - print characters in uppercase mode
365 virtual int_type overflow (int_type c) {
367 // convert lowercase to uppercase
368 c = std::toupper(static_cast<char>(c),getloc());
370 // and write the character to the standard output
371 if (putchar(c) == EOF) {
381 // create special output buffer
383 // initialize output stream with that output buffer
384 std::ostream out(&ob);
386 out << "31 hexadecimal: "
387 << std::hex << 31 << std::endl;
391 <p>Try it yourself! More examples can be found in 3.1.x code, in
392 <code>include/ext/*_filebuf.h</code>.
396 <h2><a name="7">More on binary I/O</a></h2>
397 <p>Towards the beginning of February 2001, the subject of
398 "binary" I/O was brought up in a couple of places at the
399 same time. One notable place was Usenet, where James Kanze and
400 Dietmar Kühl separately posted articles on why attempting
401 generic binary I/O was not a good idea. (Here are copies of
402 <a href="binary_iostreams_kanze.txt">Kanze's article</a> and
403 <a href="binary_iostreams_kuehl.txt">Kühl's article</a>.)
405 <p>Briefly, the problems of byte ordering and type sizes mean that
406 the unformatted functions like <code>ostream::put()</code> and
407 <code>istream::get()</code> cannot safely be used to communicate
408 between arbitrary programs, or across a network, or from one
409 invocation of a program to another invocation of the same program
410 on a different platform, etc.
412 <p>The entire Usenet thread is instructive, and took place under the
413 subject heading "binary iostreams" on both comp.std.c++
414 and comp.lang.c++.moderated in parallel. Also in that thread,
415 Dietmar Kühl mentioned that he had written a pair of stream
416 classes that would read and write XDR, which is a good step towards
417 a portable binary format.
421 <h2><a name="8">Pathetic performance? Ditch C.</a></h2>
422 <p>It sounds like a flame on C, but it isn't. Really. Calm down.
423 I'm just saying it to get your attention.
425 <p>Because the C++ library includes the C library, both C-style and
426 C++-style I/O have to work at the same time. For example:
429 #include <iostream>
430 #include <cstdio>
432 std::cout << "Hel";
433 std::printf ("lo, worl");
434 std::cout << "d!\n";
436 <p>This must do what you think it does.
438 <p>Alert members of the audience will immediately notice that buffering
439 is going to make a hash of the output unless special steps are taken.
441 <p>The special steps taken by libstdc++, at least for version 3.0,
442 involve doing very little buffering for the standard streams, leaving
443 most of the buffering to the underlying C library. (This kind of
444 thing is <a href="../explanations.html#cstdio">tricky to get right</a>.)
445 The upside is that correctness is ensured. The downside is that
446 writing through <code>cout</code> can quite easily lead to awful
447 performance when the C++ I/O library is layered on top of the C I/O
448 library (as it is for 3.0 by default). Some patches have been applied
449 which improve the situation for 3.1.
451 <p>However, the C and C++ standard streams only need to be kept in sync
452 when both libraries' facilities are in use. If your program only uses
453 C++ I/O, then there's no need to sync with the C streams. The right
454 thing to do in this case is to call
457 #include <em>any of the I/O headers such as ios, iostream, etc</em>
459 std::ios::sync_with_stdio(false);
461 <p>You must do this before performing any I/O via the C++ stream objects.
462 Once you call this, the C++ streams will operate independently of the
463 (unused) C streams. For GCC 3.x, this means that <code>cout</code> and
464 company will become fully buffered on their own.
466 <p>Note, by the way, that the synchronization requirement only applies to
467 the standard streams (<code>cin</code>, <code>cout</code>,
469 <code>clog</code>, and their wide-character counterparts). File stream
470 objects that you declare yourself have no such requirement and are fully
475 <h2><a name="9">Threads and I/O</a></h2>
476 <p>I'll assume that you have already read the
477 <a href="../17_intro/howto.html#3">general notes on library threads</a>,
479 <a href="../23_containers/howto.html#3">notes on threaded container
480 access</a> (you might not think of an I/O stream as a container, but
481 the points made there also hold here). If you have not read them,
484 <p>This gets a bit tricky. Please read carefully, and bear with me.
487 <p>As described <a href="../explanations.html#cstdio">here</a>, a wrapper
488 type called <code>__basic_file</code> provides our abstraction layer
489 for the <code>std::filebuf</code> classes. Nearly all decisions dealing
490 with actual input and output must be made in <code>__basic_file</code>.
492 <p>A generic locking mechanism is somewhat in place at the filebuf layer,
493 but is not used in the current code. Providing locking at any higher
494 level is akin to providing locking within containers, and is not done
495 for the same reasons (see the links above).
497 <h3>The defaults for 3.0.x</h3>
498 <p>The __basic_file type is simply a collection of small wrappers around
499 the C stdio layer (again, see the link under Structure). We do no
500 locking ourselves, but simply pass through to calls to <code>fopen</code>,
501 <code>fwrite</code>, and so forth.
503 <p>So, for 3.0, the question of "is multithreading safe for I/O"
504 must be answered with, "is your platform's C library threadsafe
505 for I/O?" Some are by default, some are not; many offer multiple
506 implementations of the C library with varying tradeoffs of threadsafety
507 and efficiency. You, the programmer, are always required to take care
508 with multiple threads.
510 <p>(As an example, the POSIX standard requires that C stdio FILE*
511 operations are atomic. POSIX-conforming C libraries (e.g, on Solaris
512 and GNU/Linux) have an internal mutex to serialize operations on
513 FILE*s. However, you still need to not do stupid things like calling
514 <code>fclose(fs)</code> in one thread followed by an access of
515 <code>fs</code> in another.)
517 <p>So, if your platform's C library is threadsafe, then your
518 <code>fstream</code> I/O operations will be threadsafe at the lowest
519 level. For higher-level operations, such as manipulating the data
520 contained in the stream formatting classes (e.g., setting up callbacks
521 inside an <code>std::ofstream</code>), you need to guard such accesses
522 like any other critical shared resource.
525 <p>As already mentioned <a href="../explanations.html#cstdio">here</a>, a
526 second choice is available for I/O implementations: libio. This is
527 disabled by default, and in fact will not currently work due to other
528 issues. It will be revisited, however.
530 <p>The libio code is a subset of the guts of the GNU libc (glibc) I/O
531 implementation. When libio is in use, the <code>__basic_file</code>
532 type is basically derived from FILE. (The real situation is more
533 complex than that... it's derived from an internal type used to
534 implement FILE. See libio/libioP.h to see scary things done with
535 vtbls.) The result is that there is no "layer" of C stdio
536 to go through; the filebuf makes calls directly into the same
537 functions used to implement <code>fread</code>, <code>fwrite</code>,
538 and so forth, using internal data structures. (And when I say
539 "makes calls directly," I mean the function is literally
540 replaced by a jump into an internal function. Fast but frightening.
543 <p>Also, the libio internal locks are used. This requires pulling in
544 large chunks of glibc, such as a pthreads implementation, and is one
545 of the issues preventing widespread use of libio as the libstdc++
546 cstdio implementation.
548 <p>But we plan to make this work, at least as an option if not a future
549 default. Platforms running a copy of glibc with a recent-enough
550 version will see calls from libstdc++ directly into the glibc already
551 installed. For other platforms, a copy of the libio subsection will
552 be built and included in libstdc++.
554 <h3>Alternatives</h3>
555 <p>Don't forget that other cstdio implemenations are possible. You could
556 easily write one to perform your own forms of locking, to solve your
557 "interesting" problems.
561 <!-- ####################################################### -->
564 <p class="fineprint"><em>
565 See <a href="../17_intro/license.html">license.html</a> for copying conditions.
566 Comments and suggestions are welcome, and may be sent to
567 <a href="mailto:libstdc++@gcc.gnu.org">the libstdc++ mailing list</a>.