C: The Dark Corners
C was designed with simplicity in mind. Despite this, C has a lot of dark corners that are not necessarily well known. Here follows an incomplete collection of them.
For historical reasons, we also include some C++ intricacies that may be source of confusion when writing C programs.
Constancy
Constant definition
The const
key word always applies to the identifier to the left, when any,
or to the right otherwise.
Both following lines declare a pointer over a constant integer.
const int * pi; int const * pi;
The following, however, declares a constant pointer over an integer.
int * const pi;
Pay special attention when declaring pointers to arrays because of the operator precedence. Here we have an array of 12 pointers to constant integers.
const int *pi[12];
The next one is a pointer to an array of 12 constant integers.
const int (*pi)[12];
It is always possible to make something constant, but the opposite is not true.
In C++, it is possible to add the const
key word next to a method prototype to
specify that it will not modify the attributes.
Constant pointers
The following is forbidden:
char *pc; const char **ppc; ppc = &pc; // Forbidden!
This would break the constancy rule, since it would be possible to change
**ppc
value through *pc
.
Suppose it would not be forbidden:
const char c = 'a'; // Constant variable. char *pc; // Pointer through which we will change c. const char **ppc = &pc; // Forbidden, but assume it is not. *ppc = &c; // Legal. *pc = 'b'; // Change c.
So ppc
goes through pc
to c
. Since pc
is not a pointer to a constant, we
can change the value, thus ppc
constancy is broken.
C/C++ difference for const
In C, the following
const int a = 10; int *p = &a; *p = 30; printf("&a: %u, a: %d\n", &a, a); printf("&p: %u, p: %d\n", p, *p); return 0;
outputs as expected
&a: 1021510500, a: 30 &p: 1021510500, p: 30
But in C++, the previous code won’t be allowed since the const
keyword is more
restrictive. There is a workaround though:
const int a = 10; int *p = (int*)(&a); *p = 30; printf("&a: %u, a: %d\n", &a, a); printf("&p: %u, p: %d\n", p, *p);
but the output will be:
&a: 1021510500, a: 10 &p: 1021510500, p: 30
Yes, that is the same address and two different values!
This is because C++ handles const
as an immediate value, not a variable. It
behaves similarly to #define
. The address of a const
, albeit grammatically
defined, is rather meaningless.
Constants as static array initializers
Semantically speaking, the const
keyword refers to immutable variables and
not constant variables, which is an interesting oxymoron.
As such, const
variables should not be used to initialize static arrays of
some size, since the standard requires a semantic constant here, i.e. an integer
or a preprocessor expression that expands to an integer.
int array1[17]; const unsigned int sz = sizeof array1; int array2[sizeof array1]; // OK int array3[sz]; // Wrong
In practice, most compilers accept const
variables in that case.
Function argument evaluation order
From The C Programming Language:
The order in which function arguments are evaluated is unspecified, so the statement printf(“%d %d\n”, ++n, power(2, n)); can produce different results with different compilers, depending on whether n is incremented before power is called.
Thus it is good practice to avoid expressions in function calls.
Arrays
Arrays are not pointers! There is a small number of cases when they behave differently. The following test is true:
array[0] == *array
From the C standard:
Except when it is the operand of the
sizeof
operator, the_Alignof
operator, or the unary&
operator, or is a string literal used to initialize an array, an expression that has type “array of type” is converted to an expression with type “pointer to type” that points to the initial element of the array object and is not an lvalue. If the array object has register storage class, the behavior is undefined.
Using sizeof
The sizeof
operator is dynamic and follows its own set of rules as described
by the standard. When the argument is an array, it will return the total number
of bytes.
long array[3]; long *p = array; printf("%zu\n", sizeof(array)); printf("%zu\n", sizeof(p));
On machines where long
is 8 bytes and pointers are 4 bytes, this will output:
24 4
Arrays are automatically converted to pointers in function arguments. Thus the
behavior of sizeof
is special only within the scope of an array declaration.
void foo(int array[]) { printf("foo: sizeof array == %zu\n", sizeof array); } void bar(int array[12]) { printf("bar: sizeof array == %zu\n", sizeof array); } int main() { int array[10]; printf("main: sizeof array == %zu\n", sizeof array); foo(array); bar(array); return 0; }
For multidimensional arrays, only the outermost dimension is converted to a
pointers. For instance, int array[M][N]
will be cast to int (*)[N]
. The
following will output the size of a pointer.
void foo(int *array[3]) { printf("foo: sizeof array == %zu\n", sizeof array); } int main() { int arr[2][3] = {{10, 20, 30}, {40, 50, 60}}; foo(arr); return 0; }
Addressing arrays
Arrays have a type signature that differs from pointers. The signature of a
pointer to an n-array of T is T (*)[n]
.
long array[3]; long *p; long **pp; long (*ap)[3]; p = &array; // Wrong pp = &array; // Wrong ap = &array; // OK
Note that the warning about type comes from the dereferences (&
), since the
following code does not prompt any warning:
long array[3]; long *p; long (*ap)[3]; p = array; // OK this time ap = &array; // OK
Conversely, a pointer cannot be assigned to an array:
long array[3]; long *p; array = p; // Wrong
Arrays as strings
Arrays can only be initialized with semantic constants.
char *p = "hello"; char t0[] = "world"; char t1[] = {'f', 'o', 'o'}; char t2[] = p; // Error. char t3[] = (char*) "foo"; // Error.
There is another major difference in the initialization of pointers against
arrays. The pointer will only set its value to the address of hello
stored in
the static memory segment of the program, whereas the array will copy world
from this same segment to its allocated memory. The array can be modified
afterwards, unlike the underlying value of the pointer.
Implicit cast
Numbers are automatically upcast in function calls. Compare
unsigned char a = 255; a++; printf("%d\n", a);
and
unsigned char a = 255; printf("%d\n", a+1);
There is no loss of information during an upcast, except for the char
type. C
does not specify whether a char
should be signed. Thus signed
or unsigned
should be used to ensure portability.
From The C Programming Language, section 2.7:
Conversion rules are more complicated when unsigned operands are involved. The problem is that comparisons between signed and unsigned values are machine-dependent, because they depend on the sizes of the various integer types. For example, suppose that
int
is 16 bits and long is 32 bits. Then-1L < 1U
, because 1U, which is anint
, is promoted to a signed long. But-1L > 1UL
, because-1L
is promoted to unsigned long and thus appears to be a large positive number.
See appendix A6 in the book for more implicit conversion rules.
Bit shifting
Be wary of the difference between a logical shift and an arithmetic shift. See this Wikipedia article for more details. Note that it only matters for right shifting.
The C behaviour is architecture-dependent for signed numbers.
Modulo operation
In C99, the result of a modulo operation has the sign of the dividend:
printf("-5 % 2 = %d\n", -5 % 2); printf("5 % -2 = %d\n", 5 % -2);
To test whether an integer is odd, you must compare to 0, not 1. Otherwise, the result will be incorrect when the dividend is negative.
if (n % 2 == 1) // WRONG! if (n % 2 != 0) // Correct.
Operator precedence
The choice for operator precedence in C can be counter-intuitive at times. The
expression a & b == 7
is parsed as a & (b == 7)
.
See this Wikipedia article for more details.
File reading
When a text file is open in text-mode, (e.g. using the "r"
option), POSIX
specifies that the "b"
option is ignored. Some non-POSIX operating systems,
however, may try to be too smart. They will expect a “standard” end-of-line,
such as \r\n
. Which will obviously produce unexpected results on files with
\n
line breaks. The "b"
option does not harm and helps for portability.
Globals
Pre-declarations can appear any number of times in C. They can appear only once in C++, or the compiler will complain about double definitions of globals:
#include <stdio.h> int global; int global; int global = 3; void change() { global = 17; } int main() { printf("%d\n", global); change(); printf("%d\n", global); return 0; }
In C, it will display the following:
3 17
Pointer arithmetic
It is not safe to assume that pointer arithmetic results in any integral type.
Some architectures may have memory addresses indexed over 64-bit values, while
using data over 32 bits. This behavior can be controlled from stdlib.h
. For
example, a pointer difference is stored as a type ptrdiff_t
.
Size of void
With GCC, sizeof(void) == 1
is true. This is non standard, but the behaviour
is not clearly specified either. Using -pedantic
will output a warning.
Alignment
Do not expect the memory layout in structures to be as the code describes it: the compiler is free to pad some memory for optimization purposes.
This proves dangerous when serializing data. Use the offsetof
macro to get the
real offset of each structure member.
struct {char a; int b;} foo; struct {char a; char b;} bar; printf("sizeof foo == %zu\n", sizeof foo); printf("&foo == %p\n", &foo); printf("&foo.a == %p\n", &foo.a); printf("&foo.b == %p\n", &foo.b); printf("sizeof bar == %zu\n", sizeof bar); printf("&bar == %p\n", &bar); printf("&bar.a == %p\n", &bar.a); printf("&bar.b == %p\n", &bar.b);
Precompiled headers
Compiling a header file may yield an unexpected result: some compilers such as GCC will recognize the extension and act accordingly. In that case, building a header will not result in an executable, but in a precompiled header, that is, an optimization for large headers.
If you want to force or prevent the build of precompiled headers, GCC allows for specifying the input language:
# The .xml file will be seen as a C header file. gcc -x c-header myfile.xml # The .h file will be compiled into an executable. gcc -x c myfile.h
Final note
The numerous dark corners of C require some getting used to. It is helpful and good practice to make heavy use of your compiler’s warning flags, together with some fine “lint” tools.
References
- BSD/GNU man pages
- The C Programming Language, D. Ritchie & B. Kernighan
- Draft of the C standard
- Wikipedia/Arithmetic shift
- Wikipedia/Operator precedence
- Wikipedia/Type conversion
- Better variadic functions for C
- The C Library Reference Guide