1 Copyright (C) 2000 Free Software Foundation, Inc.
3 This file is intended to contain a few notes about writing C code
4 within GCC so that it compiles without error on the full range of
5 compilers GCC needs to be able to compile on.
7 The problem is that many ISO-standard constructs are not accepted by
8 either old or buggy compilers, and we keep getting bitten by them.
9 This knowledge until know has been sparsely spread around, so I
10 thought I'd collect it in one useful place. Please add and correct
11 any problems as you come across them.
13 I'm going to start from a base of the ISO C89 standard, since that is
14 probably what most people code to naturally. Obviously using
15 constructs introduced after that is not a good idea.
17 The first section of this file deals strictly with portability issues,
18 the second with common coding pitfalls.
27 K+R C compilers and preprocessors have no notion of unary '+'. Thus
28 the following code snippet contains 2 portability problems.
30 int x = +2; /* int x = 2; */
38 K+R C compilers did not have a void pointer, and used char * as the
39 pointer to anything. The macro PTR is defined as either void * or
40 char * depending on whether you have a standards compliant compiler or
43 free ((void *) h->value.expansion);
47 free ((PTR) h->value.expansion);
53 K+R C did not allow concatenation of string literals like
55 "This is a " "single string literal".
57 Moreover, some compilers like MSVC++ have fairly low limits on the
58 maximum length of a string literal; 509 is the lowest we've come
59 across. You may need to break up a long printf statement into many
66 ISO C (6.8.3 in the 1990 standard) specifies the following:
68 If (before argument substitution) any argument consists of no
69 preprocessing tokens, the behavior is undefined.
71 This was relaxed by ISO C99, but some older compilers emit an error,
77 needs to be coded in some other way.
83 The signed keyword did not exist in K+R comilers, it was introduced in
84 ISO C89, so you cannot use it. In both K+R and standard C,
85 unqualified char and bitfields may be signed or unsigned. There is no
86 way to portably declare signed chars or signed bitfields.
88 All other arithmetic types are signed unless you use the 'unsigned'
89 qualifier. For instance, it is safe to write
97 If you have an algorithm that depends on signed char or signed
98 bitfields, you must find another way to write it before it can be
105 You need to provide a function prototype for every function before you
106 use it, and functions must be defined K+R style. The function
107 prototype should use the PARAMS macro, which takes a single argument.
108 Therefore the parameter list must be enclosed in parentheses. For
111 int myfunc PARAMS ((double, int *));
121 You also need to use PARAMS when referring to function protypes in
122 other circumstances, for example see "Calling functions through
123 pointers to functions" below.
125 Variable-argument functions are best described by example:-
127 void cpp_ice PARAMS ((cpp_reader *, const char *msgid, ...));
130 cpp_ice VPARAMS ((cpp_reader *pfile, const char *msgid, ...))
132 #ifndef ANSI_PROTOTYPES
138 VA_START (ap, msgid);
140 #ifndef ANSI_PROTOTYPES
141 pfile = va_arg (ap, cpp_reader *);
142 msgid = va_arg (ap, const char *);
149 For the curious, here are the definitions of the above macros. See
150 ansidecl.h for the definitions of the above macros and more.
152 #define PARAMS(paramlist) paramlist /* ISO C. */
153 #define VPARAMS(args) args
155 #define PARAMS(paramlist) () /* K+R C. */
156 #define VPARAMS(args) (va_alist) va_dcl
159 Calling functions through pointers to functions
160 -----------------------------------------------
162 K+R C compilers require brackets around the dereferenced pointer
163 variable. For example
165 typedef void (* cl_directive_handler) PARAMS ((cpp_reader *, const char *));
166 p->handler (pfile, p->arg);
170 (p->handler) (pfile, p->arg);
176 The rules under K+R C and ISO C for achieving stringification and
177 token pasting are quite different. Therefore some macros have been
178 defined which will get it right depending upon the compiler.
180 CONCAT2(a,b) CONCAT3(a,b,c) and CONCAT4(a,b,c,d)
182 will paste the tokens passed as arguments. You must not leave any
183 space around the commas. Also,
187 will stringify an argument; to get the same result on K+R and ISO
188 compilers x should not have spaces around it.
194 In K+R C, you have to cast enum types to use them as integers, and
195 some compilers in particular give lots of warnings for using an enum
202 See also "signed keyword" above. In K+R C only unsigned int bitfields
203 were defined (i.e. unsigned char, unsigned short, unsigned long.
204 Using plain int/short/long was not allowed).
210 Some implementations crash upon attempts to free or realloc the null
211 pointer. Thus if mem might be null, you need to write
220 K+R C has "entry" as a reserved keyword, so you should not use it for
227 K+R used unsigned-preserving rules for arithmetic expresssions, while
228 ISO uses value-preserving. This means an unsigned char compared to an
229 int is done as an unsigned comparison in K+R (since unsigned char
230 promotes to unsigned) while it is signed in ISO (since all of the
231 values in unsigned char fit in an int, it promotes to int).
233 ** Not having any argument whose type is a short type (char, short,
234 float of any flavor) and subject to promotion. **
240 You weren't going to use them anyway, but trigraphs were not defined
241 in K+R C, and some otherwise ISO C compliant compilers do not accept
245 Suffixes on Integer Constants
246 -----------------------------
248 **Using a 'u' suffix on integer constants.**
251 Common Coding Pitfalls
252 ======================
257 errno might be declared as a macro.
263 In C, the 'int' keyword can often be omitted from type declarations.
264 For instance, you can write
270 unsigned int variable;
272 There are several places where this can cause trouble. First, suppose
273 'variable' is a long; then you might think
277 would convert it to unsigned long. It does not. It converts to
278 unsigned int. This mostly causes problems on 64-bit platforms, where
279 long and int are not the same size.
281 Second, if you write a function definition with no return type at
290 that function is expected to return int, *not* void. GCC will warn
291 about this. K+R C has no problem with 'void' as a return type, so you
292 need not worry about that.
294 Implicit function declarations always have return type int. So if you
295 correct the above definition to
302 but operate() is called above its definition, you will get an error
303 about a "type mismatch with previous implicit declaration". The cure
304 is to prototype all functions at the top of the file, or in an
307 Char vs unsigned char vs int
308 ----------------------------
310 In C, unqualified 'char' may be either signed or unsigned; it is the
311 implementation's choice. When you are processing 7-bit ASCII, it does
312 not matter. But when your program must handle arbitrary binary data,
313 or fully 8-bit character sets, you have a problem. The most obvious
314 issue is if you have a look-up table indexed by characters.
316 For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
317 WITH ACUTE ACCENT. In the proper locale, isalpha('\341') will be
318 true. But if you read '\341' from a file and store it in a plain
319 char, isalpha(c) may look up character 225, or it may look up
320 character -31. And the ctype table has no entry at offset -31, so
321 your program will crash. (If you're lucky.)
323 It is wise to use unsigned char everywhere you possibly can. This
324 avoids all these problems. Unfortunately, the routines in <string.h>
325 take plain char arguments, so you have to remember to cast them back
326 and forth - or avoid the use of strxxx() functions, which is probably
329 Another common mistake is to use either char or unsigned char to
330 receive the result of getc() or related stdio functions. They may
331 return EOF, which is outside the range of values representable by
332 char. If you use char, some legal character value may be confused
333 with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
334 The correct choice is int.
336 A more subtle version of the same mistake might look like this:
338 unsigned char pushback[NPUSHBACK];
340 #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
341 #define get(c) (pbidx ? pushback[--pbidx] : getchar())
345 which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
349 Other common pitfalls
350 ---------------------
352 o Expecting 'plain' char to be either sign or unsigned extending
354 o Shifting an item by a negative amount or by greater than or equal to
355 the number of bits in a type (expecting shifts by 32 to be sensible
356 has caused quite a number of bugs at least in the early days).
358 o Expecting ints shifted right to be sign extended.
360 o Modifying the same value twice within one sequence point.
362 o Host vs. target floating point representation, including emitting NaNs
363 and Infinities in a form that the assembler handles.
365 o qsort being an unstable sort function (unstable in the sense that
366 multiple items that sort the same may be sorted in different orders
367 by different qsort functions).
369 o Passing incorrect types to fprintf and friends.
371 o Adding a function declaration for a module declared in another file to
372 a .c file instead of to a .h file.