1 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN">
4 <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
5 <META NAME="AUTHOR" CONTENT="pme@sources.redhat.com (Phil Edwards)">
6 <META NAME="KEYWORDS" CONTENT="HOWTO, libstdc++, GCC, g++, libg++, STL">
7 <META NAME="DESCRIPTION" CONTENT="HOWTO for the libstdc++ chapter 27.">
8 <META NAME="GENERATOR" CONTENT="vi and eight fingers">
9 <TITLE>libstdc++-v3 HOWTO: Chapter 27</TITLE>
10 <LINK REL=StyleSheet HREF="../lib3styles.css">
11 <!-- $Id: howto.html,v 1.4 2000/11/29 20:37:02 pme Exp $ -->
15 <H1 CLASS="centered"><A NAME="top">Chapter 27: Input/Output</A></H1>
17 <P>Chapter 27 deals with iostreams and all their subcomponents
18 and extensions. All <EM>kinds</EM> of fun stuff.
22 <!-- ####################################################### -->
26 <LI><A HREF="#1">Copying a file</A>
27 <LI><A HREF="#2">The buffering is screwing up my program!</A>
28 <LI><A HREF="#3">Binary I/O</A>
29 <LI><A HREF="#4">Iostreams class hierarchy diagram</A>
30 <LI><A HREF="#5">What is this <sstream>/stringstreams thing?</A>
35 <!-- ####################################################### -->
37 <H2><A NAME="1">Copying a file</A></H2>
38 <P>So you want to copy a file quickly and easily, and most important,
39 completely portably. And since this is C++, you have an open
40 ifstream (call it IN) and an open ofstream (call it OUT):
42 #include <fstream>
44 std::ifstream IN ("input_file");
45 std::ofstream OUT ("output_file"); </PRE>
47 <P>Here's the easiest way to get it completely wrong:
49 OUT << IN;</PRE>
50 For those of you who don't already know why this doesn't work
51 (probably from having done it before), I invite you to quickly
52 create a simple text file called "input_file" containing
55 The quick brown fox jumped over the lazy dog.</PRE>
56 surrounded by blank lines. Code it up and try it. The contents
57 of "output_file" may surprise you.
59 <P>Seriously, go do it. Get surprised, then come back. It's worth it.
62 <P>The thing to remember is that the <TT>basic_[io]stream</TT> classes
63 handle formatting, nothing else. In particular, they break up on
64 whitespace. The actual reading, writing, and storing of data is
65 handled by the <TT>basic_streambuf</TT> family. Fortunately, the
66 <TT>operator<<</TT> is overloaded to take an ostream and
67 a pointer-to-streambuf, in order to help with just this kind of
68 "dump the data verbatim" situation.
70 <P>Why a <EM>pointer</EM> to streambuf and not just a streambuf? Well,
71 the [io]streams hold pointers (or references, depending on the
72 implementation) to their buffers, not the actual
73 buffers. This allows polymorphic behavior on the part of the buffers
74 as well as the streams themselves. The pointer is easily retrieved
75 using the <TT>rdbuf()</TT> member function. Therefore, the easiest
76 way to copy the file is:
78 OUT << IN.rdbuf();</PRE>
80 <P>So what <EM>was</EM> happening with OUT<<IN? Undefined
81 behavior, since that particular << isn't defined by the Standard.
82 I have seen instances where it is implemented, but the character
83 extraction process removes all the whitespace, leaving you with no
84 blank lines and only "Thequickbrownfox...". With
85 libraries that do not define that operator, IN (or one of IN's
86 member pointers) sometimes gets converted to a void*, and the output
87 file then contains a perfect text representation of a hexidecimal
88 address (quite a big surprise). Others don't compile at all.
90 <P>Also note that none of this is specific to o<B>*f*</B>streams.
91 The operators shown above are all defined in the parent
92 basic_ostream class and are therefore available with all possible
95 <P>Return <A HREF="#top">to top of page</A> or
96 <A HREF="../faq/index.html">to the FAQ</A>.
100 <H2><A NAME="2">The buffering is screwing up my program!</A></H2>
102 This is not written very well. I need to redo this section.
104 <P>First, are you sure that you understand buffering? Particularly
105 the fact that C++ may not, in fact, have anything to do with it?
107 <P>The rules for buffering can be a little odd, but they aren't any
108 different from those of C. (Maybe that's why they can be a bit
109 odd.) Many people think that writing a newline to an output
110 stream automatically flushes the output buffer. This is true only
111 when the output stream is, in fact, a terminal and not a file
112 or some other device -- and <EM>that</EM> may not even be true
113 since C++ says nothing about files nor terminals. All of that is
114 system-dependant. (The "newline-buffer-flushing only occuring
115 on terminals" thing is mostly true on Unix systems, though.)
117 <P>Some people also believe that sending <TT>endl</TT> down an
118 output stream only writes a newline. This is incorrect; after a
119 newline is written, the buffer is also flushed. Perhaps this
120 is the effect you want when writing to a screen -- get the text
121 out as soon as possible, etc -- but the buffering is largely
122 wasted when doing this to a file:
124 output << "a line of text" << endl;
125 output << some_data_variable << endl;
126 output << "another line of text" << endl; </PRE>
127 The proper thing to do in this case to just write the data out
128 and let the libraries and the system worry about the buffering.
129 If you need a newline, just write a newline:
131 output << "a line of text\n"
132 << some_data_variable << '\n'
133 << "another line of text\n"; </PRE>
134 I have also joined the output statements into a single statement.
135 You could make the code prettier by moving the single newline to
136 the start of the quoted text on the thing line, for example.
138 <P>If you do need to flush the buffer above, you can send an
139 <TT>endl</TT> if you also need a newline, or just flush the buffer
142 output << ...... << flush; // can use std::flush manipulator
143 output.flush(); // or call a member fn </PRE>
145 <P>On the other hand, there are times when writing to a file should
146 be like writing to standard error; no buffering should be done
147 because the data needs to appear quickly (a prime example is a
148 log file for security-related information). The way to do this is
149 just to turn off the buffering <EM>before any I/O operations at
150 all</EM> have been done, i.e., as soon as possible after opening:
152 std::ofstream os ("/foo/bar/baz");
153 std::ifstream is ("/qux/quux/quuux");
156 os.rdbuf()->pubsetbuf(0,0);
157 is.rdbuf()->pubsetbuf(0,0);
159 os << "this data is written immediately\n";
160 is >> i; // and this will probably cause a disk read </PRE>
162 <P>Since all aspects of buffering are handled by a streambuf-derived
163 member, it is necessary to get at that member with <TT>rdbuf()</TT>.
164 Then the public version of <TT>setbuf</TT> can be called. The
165 arguments are the same as those for the Standard C I/O Library
166 function (a buffer area followed by its size).
168 <P>A great deal of this is implementation-dependant. For example,
169 <TT>streambuf</TT> does not specify any actions for its own
170 <TT>setbuf()</TT>-ish functions; the classes derived from
171 <TT>streambuf</TT> each define behavior that "makes
172 sense" for that class: an argument of (0,0) turns off
173 buffering for <TT>filebuf</TT> but has undefined behavior for
174 its sibling <TT>stringbuf</TT>, and specifying anything other
175 than (0,0) has varying effects. Other user-defined class derived
176 from streambuf can do whatever they want.
178 <P>A last reminder: there are usually more buffers involved than
179 just those at the language/library level. Kernel buffers, disk
180 buffers, and the like will also have an effect. Inspecting and
181 changing those are system-dependant.
183 <P>Return <A HREF="#top">to top of page</A> or
184 <A HREF="../faq/index.html">to the FAQ</A>.
188 <H2><A NAME="3">Binary I/O</A></H2>
189 <P>The first and most important thing to remember about binary I/O is
190 that opening a file with <TT>ios::binary</TT> is not, repeat
191 <EM>not</EM>, the only thing you have to do. It is not a silver
192 bullet, and will not allow you to use the <TT><</>></TT>
193 operators of the normal fstreams to do binary I/O.
195 <P>Sorry. Them's the breaks.
197 <P>This isn't going to try and be a complete tutorial on reading and
198 writing binary files (because "binary" covers a lot of
199 ground), but we will try and clear up a couple of misconceptions
202 <P>First, <TT>ios::binary</TT> has exactly one defined effect, no more
203 and no less. Normal text mode has to be concerned with the newline
204 characters, and the runtime system will translate between (for
205 example) '\n' and the appropriate end-of-line sequence (LF on Unix,
206 CRLF on DOS, CR on Macintosh, etc). (There are other things that
207 normal mode does, but that's the most obvious.) Opening a file in
208 binary mode disables this conversion, so reading a CRLF sequence
209 under Windows won't accidentally get mapped to a '\n' character, etc.
210 Binary mode is not supposed to suddenly give you a bitstream, and
211 if it is doing so in your program then you've discovered a bug in
212 your vendor's compiler (or some other part of the C++ implementation,
213 possibly the runtime system).
215 <P>Second, using <TT><<</TT> to write and <TT>>></TT> to
216 read isn't going to work with the standard file stream classes, even
217 if you use <TT>skipws</TT> during reading. Why not? Because
218 ifstream and ofstream exist for the purpose of <EM>formatting</EM>,
219 not reading and writing. Their job is to interpret the data into
220 text characters, and that's exactly what you don't want to happen
223 <P>Third, using the <TT>get()</TT> and <TT>put()/write()</TT> member
224 functions still aren't guaranteed to help you. These are
225 "unformatted" I/O functions, but still character-based.
226 (This may or may not be what you want.)
228 <P>Notice how all the problems here are due to the inappropriate use
229 of <EM>formatting</EM> functions and classes to perform something
230 which <EM>requires</EM> that formatting not be done? There are a
231 seemingly infinite number of solutions, and a few are listed here:
233 <LI>"Derive your own fstream-type classes and write your own
234 <</>> operators to do binary I/O on whatever data
235 types you're using." This is a Bad Thing, because while
236 the compiler would probably be just fine with it, other humans
237 are going to be confused. The overloaded bitshift operators
238 have a well-defined meaning (formatting), and this breaks it.
239 <LI>"Build the file structure in memory, then <TT>mmap()</TT>
240 the file and copy the structure." Well, this is easy to
241 make work, and easy to break, and is pretty equivalent to
242 using <TT>::read()</TT> and <TT>::write()</TT> directly, and
243 makes no use of the iostream library at all...
244 <LI>"Use streambufs, that's what they're there for."
245 While not trivial for the beginner, this is the best of all
246 solutions. The streambuf/filebuf layer is the layer that is
247 responsible for actual I/O. If you want to use the C++
248 library for binary I/O, this is where you start.
251 <P>How to go about using streambufs is a bit beyond the scope of this
252 document (at least for now), but while streambufs go a long way,
253 they still leave a couple of things up to you, the programmer.
254 As an example, byte ordering is completely between you and the
255 operating system, and you have to handle it yourself.
257 <P>Deriving a streambuf or filebuf
258 class from the standard ones, one that is specific to your data
259 types (or an abstraction thereof) is probably a good idea, and
260 lots of examples exist in journals and on Usenet. Using the
261 standard filebufs directly (either by declaring your own or by
262 using the pointer returned from an fstream's <TT>rdbuf()</TT>)
263 is certainly feasible as well.
265 <P>One area that causes problems is trying to do bit-by-bit operations
266 with filebufs. C++ is no different from C in this respect: I/O
267 must be done at the byte level. If you're trying to read or write
268 a few bits at a time, you're going about it the wrong way. You
269 must read/write an integral number of bytes and then process the
270 bytes. (For example, the streambuf functions take and return
271 variables of type <TT>int_type</TT>.)
273 <P>Another area of problems is opening text files in binary mode.
274 Generally, binary mode is intended for binary files, and opening
275 text files in binary mode means that you now have to deal with all of
276 those end-of-line and end-of-file problems that we mentioned before.
277 An instructive thread from comp.lang.c++.moderated delved off into
278 this topic starting more or less at
279 <A HREF="http://www.deja.com/getdoc.xp?AN=436187505">this</A>
280 article and continuing to the end of the thread. (You'll have to
281 sort through some flames every couple of paragraphs, but the points
286 <H2><A NAME="4">Iostreams class hierarchy diagram</A></H2>
287 <P>The <A HREF="iostreams_hierarchy.pdf">diagram</A> is in PDF. Rumor
288 has it that once Benjamin Kosnik has been dead for a few decades,
289 this work of his will be hung next to the Mona Lisa in the
290 <A HREF="http://www.louvre.fr/">Musee du Louvre</A>.
294 <H2><A NAME="5">What is this <sstream>/stringstreams thing?</A></H2>
295 <P>Stringstreams (defined in the header <TT><sstream></TT>)
296 are in this author's opinion one of the coolest things since
297 sliced time. An example of their use is in the Received Wisdom
298 section for Chapter 21 (Strings),
299 <A HREF="../21_strings/howto.html#1.1internal"> describing how to
302 <P>The quick definition is: they are siblings of ifstream and ofstream,
303 and they do for <TT>std::string</TT> what their siblings do for
304 files. All that work you put into writing <TT><<</TT> and
305 <TT>>></TT> functions for your classes now pays off
306 <EM>again!</EM> Need to format a string before passing the string
307 to a function? Send your stuff via <TT><<</TT> to an
308 ostringstream. You've read a string as input and need to parse it?
309 Initialize an istringstream with that string, and then pull pieces
310 out of it with <TT>>></TT>. Have a stringstream and need to
311 get a copy of the string inside? Just call the <TT>str()</TT>
314 <P>This only works if you've written your
315 <TT><<</TT>/<TT>>></TT> functions correctly, though,
316 and correctly means that they take istreams and ostreams as
317 parameters, not i<B>f</B>streams and o<B>f</B>streams. If they
318 take the latter, then your I/O operators will work fine with
319 file streams, but with nothing else -- including stringstreams.
321 <P>If you are a user of the strstream classes, you need to update
322 your code. You don't have to explicitly append <TT>ends</TT> to
323 terminate the C-style character array, you don't have to mess with
324 "freezing" functions, and you don't have to manage the
325 memory yourself. The strstreams have been officially deprecated,
326 which means that 1) future revisions of the C++ Standard won't
327 support them, and 2) if you use them, people will laugh at you.
331 <!-- ####################################################### -->
334 <P CLASS="fineprint"><EM>
335 Comments and suggestions are welcome, and may be sent to
336 <A HREF="mailto:pme@sources.redhat.com">Phil Edwards</A> or
337 <A HREF="mailto:gdr@gcc.gnu.org">Gabriel Dos Reis</A>.
338 <BR> $Id: howto.html,v 1.4 2000/11/29 20:37:02 pme Exp $