Operators: Similar to Java, C++
C has a rich set of operators, most of which will be familiar if you’ve used a C-style language. Their behavior is governed by precedence (which operators are evaluated first) and associativity (in what order operators of the same precedence are evaluated).
Here is a table of C’s operators, from highest to lowest precedence.
Early Termination (Short-Circuit Evaluation)
The logical operators ||
(boolean-or) and &&
(boolean-and) have a special property: they don’t always evaluate their second operand. This is known as short-circuiting and is a powerful and commonly used feature.
- For
A && B
, ifA
evaluates to false (0), the entire expression must be false, soB
is never evaluated. - For
A || B
, ifA
evaluates to true (non-zero), the entire expression must be true, soB
is never evaluated.
Consider this example:
#include <stdio.h>
#include <stdbool.h>
bool less_than(int x, int y) {
printf("Checking if %d < %d\n", x, y);
return (x < y);
}
int main(int argc, char *argv[]) {
// This checks if 1 < argc < 4
if (less_than(argc, 4) && less_than(1, argc)) {
printf("Yes, 1 < argc (%d) < 4\n", argc);
}
return 0;
}
The output demonstrates short-circuiting in action:
$ gcc -Wall -o early early.c
$ ./early # argc is 1
Checking if 1 < 4
Checking if 1 < 1
$ ./early a # argc is 2
Checking if 2 < 4
Checking if 1 < 2
Yes, 1 < argc (2) < 4
$ ./early a b # argc is 3
Checking if 3 < 4
Checking if 1 < 3
Yes, 1 < argc (3) < 4
$ ./early a b c # argc is 4
Checking if 4 < 4 # This is false, so the second less_than is NOT called!
$
When argc
is 4, less_than(argc, 4)
returns false. The &&
operator immediately stops, and the second operand, less_than(1, argc)
, is never evaluated.
Ternary Conditional Operator
The ternary operator (? :
) is a compact way to write an if-else expression.
result = boolean_expr ? result_if_true : result_if_false;
boolean_expr
is evaluated.- If it’s true (non-zero),
result_if_true
is evaluated and becomes the value of the whole expression.result_if_false
is not evaluated. - If it’s false (zero),
result_if_false
is evaluated and becomes the value of the whole expression.result_if_true
is not evaluated.
It’s great for simple conditional assignments, like handling pluralization
#include <stdio.h>
int main(int argc, char *argv[]) {
// If argc is 2, use "", otherwise use "s"
printf("Passed %d argument%s.\n", argc - 1, argc == 2 ? "" : "s");
return 0;
}
Assignment Operators
We’ve already seen that in C, an assignment is an expression, not just a statement. The value of an assignment expression is the value that was assigned.
This allows for idioms like if ((rc = func()))
.
C also provides compound assignment operators that combine an operation with an assignment.
x += y;
is shorthand for x = x + y;
This works for most binary operators: -=
, *=
, /=
, %=
, <<=
, &=
, etc.
What is Associativity Again?
Associativity determines how operators of the same precedence are grouped.
- Left-to-right associativity:
A + B + C
is evaluated as(A + B) + C
- Right-to-left associativity:
A += B += C
is evaluated asA += (B += C)
- This makes sense for assignment, but for most other operators, left-to-right is what you’d expect.
Post-increment and Pre-increment
C inherited these operators directly from the hardware addressing modes of the PDP-11 computer.
i++
(post-increment):- Value: The expression evaluates to the current value of
i
. - Effect: After the value is taken,
i
is incremented by 1.
- Value: The expression evaluates to the current value of
++i
(pre-increment):- Effect:
i
is incremented by 1. - Value: The expression evaluates to the new value of
i
.
- Effect:
The same logic applies to i--
and --i
. These operators work for any scalar type (integers, floats) and, importantly, for pointers.
Casting
You can explicitly convert, or cast, a value from one type to another. The syntax is to put the new type name in parentheses before the expression.
unsigned int ui = 0xDEADBEEF;
signed int i = (signed int)ui;
// i now has the value -559038737
A cast is an operator. What it does depends on the types involved:
- When casting between integer types of the same size (like
unsigned int
tosigned int
), the bit-representation does not change. The bits are simply reinterpreted according to the rules of the new type. - When casting between types of different sizes or between integers and floats, the value and bit-representation will change. We’ll cover these rules in the next chapter.
Summary: C Operators
- Operators and precedence determine the order of evaluation.
- Early termination (
&&
,||
) is a key feature for control flow. - The ternary operator (
? :
) provides a compact if-else expression. - Assignment operators are expressions themselves.
- Post/pre inc/decrement (
i++
,++i
) are useful but have subtle timing differences. - Casting allows explicit type conversion, often by reinterpreting bits.
Arrays in C
Arrays
An array in C is a simple, powerful, and dangerous construct.
- It’s a finite vector of variables, all of the same type, stored contiguously in memory.
- For an N-element array
a
:- The last element is
a[N-1]
. - The first element is
a[0]
.
- The last element is
#include <stdio.h>
float data[5]; // data to average and total
float total; // total of the data items
float average; // average of the items
int main() {
data[0] = 34.0;
data[1] = 27.0;
data[2] = 45.0;
data[3] = 82.0;
data[4] = 22.0;
total = data[0] + data[1] + data[2] + data[3] + data[4];
average = total / 5.0;
printf("Total %f Average %f\n", total, average);
return(0);
}
C compiler does not check array bounds!
If you write to
data[5]
ordata[-1]
, the compiler won’t stop you. Your program will simply write to whatever memory happens to be adjacent to the array. This is a very typical and dangerous bug. Always check your array bounds!
Multi-dimensional Arrays
You can create multi-dimensional arrays, which are just arrays of arrays. In memory, they are laid out contiguously in row-major order.
int mat[3][3];
is laid out in memory as:
mat[0][0]
, mat[0][1]
, mat[0][2]
, mat[1][0]
, mat[1][1]
, mat[1][2]
, mat[2][0]
, mat[2][1]
, mat[2][2]
This has huge performance implications. Accessing elements sequentially is cache-friendly and fast. Jumping around is slow.
Array Initializers
You can initialize an array when you define it using curly braces.
#include <stdio.h>
int main(int argc, char *argv[]) {
int i, j;
int a[3] = {3, 7, 9};
int b[3][3] = {
{1, 2, 3},
{4, 5, 6},
{7, 8, 9},
};
for(i = 0; i < 3; i++) {
printf("a[%d] = %d\n", i, a[i]);
for(j = 0; j < 3; j++) {
printf(" b[%d][%d] = %d\n", i, j, b[i][j]);
}
}
return 0;
}
Strings
This is a critical concept in C.
- C has no real string type!
- Instead, a “string” is a convention: it is an array of
char
s terminated with a null byte (0
or'\0'
).
The compiler provides syntactic sugar for this convention. These two definitions are identical:
// These strings are identical
char s1[6] = "hello";
char s2[6] = { 'h', 'e', 'l', 'l', 'o', 0 };
The string literal "hello"
automatically includes the terminating null byte, so it requires an array of size 6.
String Library Functions
Since there are no built-in string operations, C provides a standard library (<string.h>
) of functions that operate on these null-terminated character arrays. The functions are generally named strxxx()
.
#include <stdio.h>
#include <string.h>
int main(int argc, char *argv[]) {
char name1[12], name2[12];
char mixed[25], title[20];
strncpy(name1, "Rosalinda", 12); // Safe copy
strncpy(name2, "Zeke", 12);
strncpy(title, "This is the title.", 20);
printf(" %s\n\n", title);
printf("Name 1 is %s\n", name1);
printf("Name 2 is %s\n", name2);
// Compare strings
if (strncmp(name1, name2, 12) > 0) {
strncpy(mixed, name1, 25);
} else {
strncpy(mixed, name2, 25);
}
printf("The biggest name alphabetically is %s\n", mixed);
// Concatenate strings
strncpy(mixed, name1, 24);
strncat(mixed, " & ", 24);
strncat(mixed, name2, 24);
printf("Both names are %s\n", mixed);
return 0;
}
strncpy(dest, src, n)
: Safely copies at mostn
characters fromsrc
todest
.strncmp(s1, s2, n)
: Compares at mostn
characters ofs1
ands2
.strncat(dest, src, n)
: Safely appendssrc
to the end ofdest
.
Warning
Always use the
n
versions of these functions (strncpy
,strncat
, etc.). The versions withoutn
(strcpy
,strcat
) do not check for buffer sizes and are a massive source of security vulnerabilities.
Summary: C Arrays
- Arrays of basic types are contiguous blocks of memory.
- Multidimensional arrays are stored in row-major order.
- Initializers provide a convenient syntax for defining array contents.
- Strings are a convention: null-terminated arrays of characters.
- The string library provides functions to manipulate these arrays.
We’ll see a lot more about how arrays and pointers are deeply intertwined as the course progresses.
The C Preprocessor
The preprocessor is the first stage of the C toolchain. It’s a powerful, if somewhat crude, tool that transforms your source code text before the compiler proper ever sees it. It’s the foundation of C’s modularity and enables many common idioms.
#include
This is the most common preprocessor directive. It literally pastes the content of one file into another.
#include <file1.h>
#include "file2.h"
- This is the basic mechanism for defining and using APIs in C.
- The difference between
<...>
and"
…"
is the search path:<>
is for system headers (e.g.,/usr/include
).""
is for your own headers, searching first in the current directory.
- Included files can include other files. This can lead to problems if you’re not careful.
Here’s a demonstration. We have a .c
file and a .h
file it includes:
When we run the preprocessor (gcc -E
), the contents of cpp_example.h
are pasted into cpp_example.c
, and all the macros are expanded. The lines starting with #
are markers for the compiler to keep track of original file names and line numbers for error messages.
Macro Definitions (#define
)
Macros perform token-based text substitution. The preprocessor doesn’t understand C syntax; it just replaces tokens.
#define FOO BAZ
#define BAR(x) (x+3)
...
#undef FOO
#define QUX
- Any subsequent occurrence of the token
FOO
is replaced with the tokenBAZ
. BAR(4)
is replaced with the text(4+3)
. Note that it is not replaced with7
; the preprocessor doesn’t do arithmetic.#undef
removes a macro definition.#define QUX
definesQUX
as an empty string.
Macros can be large
You can create multi-line macros, which were historically used for inlining small functions.
#define SKIP_SPACES(p, limit) \
{ char *lim = (limit); \
while (p < lim) { \
if (*p++ != ' ') { \
p--; break; }}}
This is dangerous because it’s pure text substitution. A more subtle problem is the “swallowing the semicolon” issue. If you use this macro like a function call in an if-else
statement, you get a syntax error:
if (*p != 0)
SKIP_SPACES(p, my_limit); // Expands to { ... };
else // This 'else' has no matching 'if'!
...
The semicolon after the macro becomes a null statement, separating the if
from the else
. The solution is a common C idiom: wrap the macro body in a do { ... } while(0)
loop. This makes the entire macro a single statement that can correctly “swallow” the trailing semicolon.
Preprocessor Conditionals
The preprocessor can conditionally include or exclude blocks of code. This is essential for writing code that can be compiled for different platforms or with different features enabled.
#if expression
... // text 1
#else
... // text 2
#endif
#ifdef FOO // Shorthand for #if defined(FOO)
...
#endif
#ifndef BAR // Shorthand for #if !defined(BAR)
...
#endif
The expression
is evaluated by the preprocessor at compile time. It can contain literals, operators, and other macros.
Token Manipulation
The preprocessor has two special operators for advanced macro magic:
#
(Stringizing): Turns a macro argument into a string literal.##
(Token Pasting): Concatenates two tokens to form a single new token.
These are often used to reduce boilerplate. Imagine a table mapping command strings to handler functions:
struct command {
char *name;
void (*function)();
};
struct command commands[] = {
{ "quit", quit_command},
{ "help", help_command},
...
};
This is repetitive. We’d rather write COMMAND(quit)
. We can define this macro using stringizing and token pasting:
#define COMMAND(c) { #c, c ## _command }
#c
turnsquit
into"quit"
.c ## _command
turnsquit
into the single tokenquit_command
.
Predefined Macros
The preprocessor provides several useful built-in macros:
__FILE__
: The name of the current source file.__LINE__
: The current line number in the source file.__DATE__
: The compilation date.__TIME__
: The compilation time.__STDC__
: Defined if this is a standard-compliant compiler.
Summary: C Preprocessor
- It’s the first stage of the toolchain.
#include ""
and#include <>
paste file contents.#define
performs powerful but simple text substitution.- The
do-while(0)
trick is used to “swallow the semicolon” in complex macros. #if
,#ifdef
,#ifndef
allow for conditional compilation.#
(stringizing) and##
(token pasting) enable advanced macro techniques.
Modularity in C
C has no built-in language features for modules, packages, or namespaces. Instead, modularity is achieved through a set of conventions using the preprocessor and the linker.
Declarations vs. Definitions
This is a crucial distinction in C.
- A declaration says something exists, somewhere. It introduces a name and its type to the compiler. A function prototype is a declaration.
char *strncpy(char *dest, const char *src, size_t n); // A "prototype"
- A definition says what that thing is. It provides the actual code or allocates the storage for a variable.
char *strncpy(...) { ... body ... }
Compilation Units, extern
, and static
- C deals with “compilation units”: a single
.c
file plus everything it includes. Each compilation unit is compiled independently into an object file. - Declarations can be annotated to control their visibility across compilation units:
extern
: This is a promise to the compiler that the definition for this thing exists somewhere else, likely in another compilation unit. This is the default for functions.static
: This restricts the visibility of a declaration (and its definition) to the current compilation unit only. It cannot be seen or accessed from other files.
This applies to global variables as well:
// In a header file, you might declare:
extern const char *banner; // Defined in some other .c file
// In a .c file, you might declare and define:
static int priv_count = 0; // Only in scope in this unit
// In some other .c file, you provide the definition for the extern variable:
const char *banner = "Welcome to Barrelfish";
Modularity in C: Header Files
The convention for creating a module in C is to split it into two files: a header file and a source file.
- A module is a self-contained piece of a program.
- It consists of an interface (the externally visible parts: functions, types, macros) and an implementation (the internal parts that clients shouldn’t see).
- Specify interfaces with header files (
.h
)- A module
foo
has its public interface infoo.h
. - Clients of the module
#include "foo.h"
. foo.h
contains no definitions, only external declarations (function prototypes,extern
variables,typedef
s).
- A module
- Implementation is in a source file (
.c
)- The implementation is typically in
foo.c
. foo.c
also includes its own header,foo.h
, to let the compiler check for consistency between declarations and definitions.foo.c
contains the definitions for the interface functions, plus any internal (static
) functions and variables.
- The implementation is typically in
The Header Guard Idiom
What happens if a file includes a.h
and b.h
, and b.h
also includes a.h
? The declarations in a.h
would appear twice in the same compilation unit, leading to a compiler error.
To prevent this, every header file must use a header guard, a standard preprocessor idiom.
// "file.h":
#ifndef _FILE_H_
#define _FILE_H_
... // all declarations and macros go here
#endif // _FILE_H_
- The first time the preprocessor sees this file,
_FILE_H_
is not defined, so it defines it and processes the contents. - The second time it sees this file in the same compilation unit,
_FILE_H_
is already defined, so the#ifndef
is false, and the preprocessor skips the entire contents. - This ensures the file’s contents only appear once.
Danger
Never
#include
a.c
file! Including source files breaks the separation of declaration and definition and will lead to multiple-definition errors from the linker.
Summary: Modularity
- The key distinction is between declarations (what exists) and definitions (what it is).
extern
andstatic
control visibility across compilation units.- Modularity is achieved by convention: header files (
.h
) for interfaces (declarations) and source files (.c
) for implementations (definitions). - The header guard idiom (
#ifndef...
) is essential to prevent multiple inclusion errors.