Operators: Similar to Java, C++

C has a rich set of operators, most of which will be familiar if you’ve used a C-style language. Their behavior is governed by precedence (which operators are evaluated first) and associativity (in what order operators of the same precedence are evaluated).

Here is a table of C’s operators, from highest to lowest precedence.

Early Termination (Short-Circuit Evaluation)

The logical operators || (boolean-or) and && (boolean-and) have a special property: they don’t always evaluate their second operand. This is known as short-circuiting and is a powerful and commonly used feature.

  • For A && B, if A evaluates to false (0), the entire expression must be false, so B is never evaluated.
  • For A || B, if A evaluates to true (non-zero), the entire expression must be true, so B is never evaluated.

Consider this example:

#include <stdio.h>
#include <stdbool.h>
 
bool less_than(int x, int y) {
    printf("Checking if %d < %d\n", x, y);
    return (x < y);
}
 
int main(int argc, char *argv[]) {
    // This checks if 1 < argc < 4
    if (less_than(argc, 4) && less_than(1, argc)) {
        printf("Yes, 1 < argc (%d) < 4\n", argc);
    }
    return 0;
}

The output demonstrates short-circuiting in action:

$ gcc -Wall -o early early.c
$ ./early         # argc is 1
Checking if 1 < 4
Checking if 1 < 1
$ ./early a       # argc is 2
Checking if 2 < 4
Checking if 1 < 2
Yes, 1 < argc (2) < 4
$ ./early a b     # argc is 3
Checking if 3 < 4
Checking if 1 < 3
Yes, 1 < argc (3) < 4
$ ./early a b c   # argc is 4
Checking if 4 < 4 # This is false, so the second less_than is NOT called!
$

When argc is 4, less_than(argc, 4) returns false. The && operator immediately stops, and the second operand, less_than(1, argc), is never evaluated.

Ternary Conditional Operator

The ternary operator (? :) is a compact way to write an if-else expression.

result = boolean_expr ? result_if_true : result_if_false;

  1. boolean_expr is evaluated.
  2. If it’s true (non-zero), result_if_true is evaluated and becomes the value of the whole expression. result_if_false is not evaluated.
  3. If it’s false (zero), result_if_false is evaluated and becomes the value of the whole expression. result_if_true is not evaluated.

It’s great for simple conditional assignments, like handling pluralization

#include <stdio.h>
 
int main(int argc, char *argv[]) {
    // If argc is 2, use "", otherwise use "s"
    printf("Passed %d argument%s.\n", argc - 1, argc == 2 ? "" : "s");
    return 0;
}

Assignment Operators

We’ve already seen that in C, an assignment is an expression, not just a statement. The value of an assignment expression is the value that was assigned.

This allows for idioms like if ((rc = func())).

C also provides compound assignment operators that combine an operation with an assignment. x += y; is shorthand for x = x + y;

This works for most binary operators: -=, *=, /=, %=, <<=, &=, etc.

What is Associativity Again?

Associativity determines how operators of the same precedence are grouped.

  • Left-to-right associativity:
    • A + B + C is evaluated as (A + B) + C
  • Right-to-left associativity:
    • A += B += C is evaluated as A += (B += C)
    • This makes sense for assignment, but for most other operators, left-to-right is what you’d expect.

Post-increment and Pre-increment

C inherited these operators directly from the hardware addressing modes of the PDP-11 computer.

  • i++ (post-increment):
    • Value: The expression evaluates to the current value of i.
    • Effect: After the value is taken, i is incremented by 1.
  • ++i (pre-increment):
    • Effect: i is incremented by 1.
    • Value: The expression evaluates to the new value of i.

The same logic applies to i-- and --i. These operators work for any scalar type (integers, floats) and, importantly, for pointers.

Casting

You can explicitly convert, or cast, a value from one type to another. The syntax is to put the new type name in parentheses before the expression.

unsigned int ui = 0xDEADBEEF;
signed int i = (signed int)ui;
// i now has the value -559038737

A cast is an operator. What it does depends on the types involved:

  • When casting between integer types of the same size (like unsigned int to signed int), the bit-representation does not change. The bits are simply reinterpreted according to the rules of the new type.
  • When casting between types of different sizes or between integers and floats, the value and bit-representation will change. We’ll cover these rules in the next chapter.

Summary: C Operators

  • Operators and precedence determine the order of evaluation.
  • Early termination (&&, ||) is a key feature for control flow.
  • The ternary operator (? :) provides a compact if-else expression.
  • Assignment operators are expressions themselves.
  • Post/pre inc/decrement (i++, ++i) are useful but have subtle timing differences.
  • Casting allows explicit type conversion, often by reinterpreting bits.

Arrays in C

Arrays

An array in C is a simple, powerful, and dangerous construct.

  • It’s a finite vector of variables, all of the same type, stored contiguously in memory.
  • For an N-element array a:
    • The last element is a[N-1].
    • The first element is a[0].
#include <stdio.h>
 
float data[5]; // data to average and total
float total;   // total of the data items
float average; // average of the items
 
int main() {
    data[0] = 34.0;
    data[1] = 27.0;
    data[2] = 45.0;
    data[3] = 82.0;
    data[4] = 22.0;
 
    total = data[0] + data[1] + data[2] + data[3] + data[4];
    average = total / 5.0;
    printf("Total %f Average %f\n", total, average);
    return(0);
}

C compiler does not check array bounds!

If you write to data[5] or data[-1], the compiler won’t stop you. Your program will simply write to whatever memory happens to be adjacent to the array. This is a very typical and dangerous bug. Always check your array bounds!

Multi-dimensional Arrays

You can create multi-dimensional arrays, which are just arrays of arrays. In memory, they are laid out contiguously in row-major order.

int mat[3][3]; is laid out in memory as: mat[0][0], mat[0][1], mat[0][2], mat[1][0], mat[1][1], mat[1][2], mat[2][0], mat[2][1], mat[2][2]

This has huge performance implications. Accessing elements sequentially is cache-friendly and fast. Jumping around is slow.

Array Initializers

You can initialize an array when you define it using curly braces.

#include <stdio.h>
 
int main(int argc, char *argv[]) {
    int i, j;
    int a[3] = {3, 7, 9};
    int b[3][3] = {
        {1, 2, 3},
        {4, 5, 6},
        {7, 8, 9},
    };
 
    for(i = 0; i < 3; i++) {
        printf("a[%d] = %d\n", i, a[i]);
        for(j = 0; j < 3; j++) {
            printf(" b[%d][%d] = %d\n", i, j, b[i][j]);
        }
    }
    return 0;
}

Strings

This is a critical concept in C.

  • C has no real string type!
  • Instead, a “string” is a convention: it is an array of chars terminated with a null byte (0 or '\0').

The compiler provides syntactic sugar for this convention. These two definitions are identical:

// These strings are identical
char s1[6] = "hello";
char s2[6] = { 'h', 'e', 'l', 'l', 'o', 0 };

The string literal "hello" automatically includes the terminating null byte, so it requires an array of size 6.

String Library Functions

Since there are no built-in string operations, C provides a standard library (<string.h>) of functions that operate on these null-terminated character arrays. The functions are generally named strxxx().

#include <stdio.h>
#include <string.h>
 
int main(int argc, char *argv[]) {
    char name1[12], name2[12];
    char mixed[25], title[20];
 
    strncpy(name1, "Rosalinda", 12); // Safe copy
    strncpy(name2, "Zeke", 12);
    strncpy(title, "This is the title.", 20);
 
    printf(" %s\n\n", title);
    printf("Name 1 is %s\n", name1);
    printf("Name 2 is %s\n", name2);
 
    // Compare strings
    if (strncmp(name1, name2, 12) > 0) {
        strncpy(mixed, name1, 25);
    } else {
        strncpy(mixed, name2, 25);
    }
    printf("The biggest name alphabetically is %s\n", mixed);
 
    // Concatenate strings
    strncpy(mixed, name1, 24);
    strncat(mixed, " & ", 24);
    strncat(mixed, name2, 24);
    printf("Both names are %s\n", mixed);
    return 0;
}
  • strncpy(dest, src, n): Safely copies at most n characters from src to dest.
  • strncmp(s1, s2, n): Compares at most n characters of s1 and s2.
  • strncat(dest, src, n): Safely appends src to the end of dest.

Warning

Always use the n versions of these functions (strncpy, strncat, etc.). The versions without n (strcpy, strcat) do not check for buffer sizes and are a massive source of security vulnerabilities.

Summary: C Arrays

  • Arrays of basic types are contiguous blocks of memory.
  • Multidimensional arrays are stored in row-major order.
  • Initializers provide a convenient syntax for defining array contents.
  • Strings are a convention: null-terminated arrays of characters.
  • The string library provides functions to manipulate these arrays.

We’ll see a lot more about how arrays and pointers are deeply intertwined as the course progresses.

The C Preprocessor

The preprocessor is the first stage of the C toolchain. It’s a powerful, if somewhat crude, tool that transforms your source code text before the compiler proper ever sees it. It’s the foundation of C’s modularity and enables many common idioms.

#include

This is the most common preprocessor directive. It literally pastes the content of one file into another.

#include <file1.h>
#include "file2.h"
  • This is the basic mechanism for defining and using APIs in C.
  • The difference between <...> and "" is the search path:
    • <> is for system headers (e.g., /usr/include).
    • "" is for your own headers, searching first in the current directory.
  • Included files can include other files. This can lead to problems if you’re not careful.

Here’s a demonstration. We have a .c file and a .h file it includes:

When we run the preprocessor (gcc -E), the contents of cpp_example.h are pasted into cpp_example.c, and all the macros are expanded. The lines starting with # are markers for the compiler to keep track of original file names and line numbers for error messages.

Macro Definitions (#define)

Macros perform token-based text substitution. The preprocessor doesn’t understand C syntax; it just replaces tokens.

#define FOO BAZ
#define BAR(x) (x+3)
...
#undef FOO
#define QUX
  • Any subsequent occurrence of the token FOO is replaced with the token BAZ.
  • BAR(4) is replaced with the text (4+3). Note that it is not replaced with 7; the preprocessor doesn’t do arithmetic.
  • #undef removes a macro definition.
  • #define QUX defines QUX as an empty string.

Macros can be large

You can create multi-line macros, which were historically used for inlining small functions.

#define SKIP_SPACES(p, limit) \
{ char *lim = (limit);        \
  while (p < lim) {           \
    if (*p++ != ' ') {        \
      p--; break; }}}

This is dangerous because it’s pure text substitution. A more subtle problem is the “swallowing the semicolon” issue. If you use this macro like a function call in an if-else statement, you get a syntax error:

if (*p != 0)
    SKIP_SPACES(p, my_limit); // Expands to { ... };
else // This 'else' has no matching 'if'!
    ...

The semicolon after the macro becomes a null statement, separating the if from the else. The solution is a common C idiom: wrap the macro body in a do { ... } while(0) loop. This makes the entire macro a single statement that can correctly “swallow” the trailing semicolon.

Preprocessor Conditionals

The preprocessor can conditionally include or exclude blocks of code. This is essential for writing code that can be compiled for different platforms or with different features enabled.

#if expression
    ... // text 1
#else
    ... // text 2
#endif
 
#ifdef FOO      // Shorthand for #if defined(FOO)
    ...
#endif
 
#ifndef BAR     // Shorthand for #if !defined(BAR)
    ...
#endif

The expression is evaluated by the preprocessor at compile time. It can contain literals, operators, and other macros.

Token Manipulation

The preprocessor has two special operators for advanced macro magic:

  • # (Stringizing): Turns a macro argument into a string literal.
  • ## (Token Pasting): Concatenates two tokens to form a single new token.

These are often used to reduce boilerplate. Imagine a table mapping command strings to handler functions:

struct command {
    char *name;
    void (*function)();
};
 
struct command commands[] = {
    { "quit", quit_command},
    { "help", help_command},
    ...
};

This is repetitive. We’d rather write COMMAND(quit). We can define this macro using stringizing and token pasting:

#define COMMAND(c) { #c, c ## _command }

  • #c turns quit into "quit".
  • c ## _command turns quit into the single token quit_command.

Predefined Macros

The preprocessor provides several useful built-in macros:

  • __FILE__: The name of the current source file.
  • __LINE__: The current line number in the source file.
  • __DATE__: The compilation date.
  • __TIME__: The compilation time.
  • __STDC__: Defined if this is a standard-compliant compiler.

Summary: C Preprocessor

  • It’s the first stage of the toolchain.
  • #include "" and #include <> paste file contents.
  • #define performs powerful but simple text substitution.
  • The do-while(0) trick is used to “swallow the semicolon” in complex macros.
  • #if, #ifdef, #ifndef allow for conditional compilation.
  • # (stringizing) and ## (token pasting) enable advanced macro techniques.

Modularity in C

C has no built-in language features for modules, packages, or namespaces. Instead, modularity is achieved through a set of conventions using the preprocessor and the linker.

Declarations vs. Definitions

This is a crucial distinction in C.

  • A declaration says something exists, somewhere. It introduces a name and its type to the compiler. A function prototype is a declaration. char *strncpy(char *dest, const char *src, size_t n); // A "prototype"
  • A definition says what that thing is. It provides the actual code or allocates the storage for a variable. char *strncpy(...) { ... body ... }

Compilation Units, extern, and static

  • C deals with “compilation units”: a single .c file plus everything it includes. Each compilation unit is compiled independently into an object file.
  • Declarations can be annotated to control their visibility across compilation units:
    • extern: This is a promise to the compiler that the definition for this thing exists somewhere else, likely in another compilation unit. This is the default for functions.
    • static: This restricts the visibility of a declaration (and its definition) to the current compilation unit only. It cannot be seen or accessed from other files.

This applies to global variables as well:

// In a header file, you might declare:
extern const char *banner; // Defined in some other .c file
 
// In a .c file, you might declare and define:
static int priv_count = 0; // Only in scope in this unit
 
// In some other .c file, you provide the definition for the extern variable:
const char *banner = "Welcome to Barrelfish";

Modularity in C: Header Files

The convention for creating a module in C is to split it into two files: a header file and a source file.

  • A module is a self-contained piece of a program.
  • It consists of an interface (the externally visible parts: functions, types, macros) and an implementation (the internal parts that clients shouldn’t see).

  • Specify interfaces with header files (.h)
    • A module foo has its public interface in foo.h.
    • Clients of the module #include "foo.h".
    • foo.h contains no definitions, only external declarations (function prototypes, extern variables, typedefs).
  • Implementation is in a source file (.c)
    • The implementation is typically in foo.c.
    • foo.c also includes its own header, foo.h, to let the compiler check for consistency between declarations and definitions.
    • foo.c contains the definitions for the interface functions, plus any internal (static) functions and variables.

The Header Guard Idiom

What happens if a file includes a.h and b.h, and b.h also includes a.h? The declarations in a.h would appear twice in the same compilation unit, leading to a compiler error.

To prevent this, every header file must use a header guard, a standard preprocessor idiom.

// "file.h":
#ifndef _FILE_H_
#define _FILE_H_
 
... // all declarations and macros go here
 
#endif // _FILE_H_
  • The first time the preprocessor sees this file, _FILE_H_ is not defined, so it defines it and processes the contents.
  • The second time it sees this file in the same compilation unit, _FILE_H_ is already defined, so the #ifndef is false, and the preprocessor skips the entire contents.
  • This ensures the file’s contents only appear once.

Danger

Never #include a .c file! Including source files breaks the separation of declaration and definition and will lead to multiple-definition errors from the linker.

Summary: Modularity

  • The key distinction is between declarations (what exists) and definitions (what it is).
  • extern and static control visibility across compilation units.
  • Modularity is achieved by convention: header files (.h) for interfaces (declarations) and source files (.c) for implementations (definitions).
  • The header guard idiom (#ifndef...) is essential to prevent multiple inclusion errors.