Now that we’ve set the stage and explored the mindset of a systems programmer, it’s time to introduce our primary tool: the C programming language. C is old, simple, and powerful. It’s the lingua franca of systems programming for a reason. It doesn’t hold your hand, but it gives you unparalleled control over the machine.
History and Toolchain
History
To understand C, it helps to know where it came from.
- Developed 1969-1972 by Dennis Ritchie (and Brian Kernighan) at the legendary Bell Labs.
- It didn’t appear in a vacuum. It has a clear lineage:
- CPL (Combined Programming Language, 1963) was a massive, complex language, famously considered “unimplementable.”
- BCPL (Basic CPL, 1967) was a radical simplification, stripping the language down to a single data type: the machine word. It was a kind of portable assembly.
- B (1969) was Ken Thompson’s adaptation of BCPL for the new Unix operating system.
- C was Ritchie’s successor to B, adding back a simple type system.
- It was highly influenced by the DEC PDP-11 architecture, the machine Unix was being ported to. Some of C’s quirks make a lot more sense when you know the PDP-11’s instruction set.
- Despite its specific origins, it was designed to be portable across many architectures, which was a key factor in its success and the success of Unix.
Standards
- K&R C: The original “standard” was simply the language described in Kernighan and Ritchie’s book, The C Programming Language. The compiler’s source code was the ultimate specification!
- ANSI C (C89/C90): The first formal standard.
- C99: A major update that added many useful features. This is the version we’ll use in this course.
- C11, C17: More recent updates, mostly adding minor features and bug fixes.
- …and many C-like variants that have borrowed its syntax and philosophy.
C’s Enduring Popularity
More than 50 years on, C is still ubiquitous. Look at the TIOBE Programming Community Index, which tracks language popularity. C has consistently been at or near the top for decades.
Why does this old, seemingly primitive language persist?
Compared to Java, C#, PHP, Python, etc…
C offers a unique set of trade-offs that keep it relevant.
- It is very fast.
- A good C compiler generates highly optimized machine code. It’s almost impossible to write assembly by hand that’s consistently faster.
- It’s also pretty much impossible to compile a high-level managed language like Java to run as fast as C. The layers of abstraction (JVM, garbage collection) have a cost.
- It has a powerful macro pre-processor (
cpp
). This is a text-substitution tool that runs before the compiler proper. It’s a blunt instrument, but it enables a lot of powerful idioms. - It is close to the metal. This is the killer feature. When you look at a line of C, you can know what the code is doing to the hardware. There are no hidden mechanisms or unpredictable runtime behaviors.
Because of these characteristics, C remains the language of choice for:
- Operating System developers
- Embedded systems programmers
- People who really care about speed (high-performance computing, game engines)
- Authors of security exploits (who need precise control over the machine’s state)
What You Don’t Get
C’s power comes from its simplicity, which means it lacks many features you might take for granted.
- No objects, classes, traits, features, methods, or interfaces. C is a procedural language. You have data and you have functions that operate on that data. That’s it. (We’ll see function pointers later, which let you build object-like systems yourself).
- No fancy built-in types. There’s no built-in
string
orlist
type. The types provided are mostly just what the hardware provides (integers and floats of various sizes). You use type constructors (struct
) to build more complex types yourself. - No exception handling. There is no
try
/catch
. The universal convention is to use integer return codes: a function returns 0 for success and a non-zero value to indicate an error.
The Most Important Difference
If there’s one thing to take away about what makes C different, it’s this:
- No automatic memory management.
- There is no garbage collection.
- Memory is either allocated on the stack (and disappears when a function returns) or on the heap.
- Heap structures must be explicitly created and freed by the programmer. This is a huge source of bugs, but also a huge source of C’s performance and predictability.
- Pointers: direct access to memory addresses. A pointer is not a “reference”; it’s a variable that holds a memory address, a raw number. It’s weakly typed by what it points to, but you can make it point to anything.
Success
C is about directly building and manipulating structures in main memory! This is the fundamental mental model. You’re not working with abstract objects; you’re arranging bytes in memory.
Syntax: The Good News & The Bad News
The good news is that C’s syntax is probably already familiar to you.
- The syntax of Java, JavaScript, C++, and C# was almost entirely lifted from C.
- Comments (
/* ... */
,//
) are the same. - Identifiers are mostly the same as in Java.
- Block structure using
{ ... }
is the C way.
The main differences are subtle but important:
- The list of reserved words is different.
- C is run through a macro preprocessor. This is a separate step that performs text substitution, file inclusion, and conditional compilation before the actual compiler sees the code. C# has preprocessor directives, but it’s not a separate, text-based stage like in C.
Hello, World!
Here is the canonical first program in C. It’s a good way to see the basic structure.
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("hello, world\n");
return 0;
}
Let’s break it down:
#include <stdio.h>
: This is a preprocessor directive. It tells the preprocessor to find the “header file”stdio.h
and paste its contents right here. This file contains declarations for standard input/output functions, likeprintf
. It’s a bit like an interface file in Java or C#.int main(...)
: Every C program must have amain
function. This is the entry point where execution begins. It takes a list of command-line arguments and returns an integer status code.printf(...)
: This is a standard library function for printing formatted strings. The\n
is an escape sequence for a newline character; unlike some languages, newlines are not added automatically.return 0;
: This exits themain
function and returns a status code of 0 to the operating system. By convention, 0 means “everything is OK.” Since C has no exceptions, this is the primary way to signal success or failure.
The C Toolchain: From Source to Execution
How does that simple text file become a running program? It’s a multi-stage process called the toolchain. In C, these stages are explicit, and you can stop the process at any point to inspect the intermediate results.
- Editing: You start by writing source code in text files, typically with
.c
(source) and.h
(header) extensions. - Preprocessing (
cpp
): The preprocessor (cpp
) takes a.c
file, handles all the#include
,#define
, and other directives, and produces a single, large, pre-processed C source file (often with a.i
extension). - Compilation (
cc1
): The compiler proper (cc1
) takes the.i
file and translates it into assembly language for your target architecture (a.s
file). - Assembly (
as
): The assembler (as
) takes the.s
file and translates it into a machine-code object file (.o
). This file contains the raw binary instructions and data, but it’s not yet executable. It has unresolved references to functions and variables in other files. - Linking (
ld
): The linker (ld
) takes one or more object files (.o
), along with static libraries (.a
), and combines them. It resolves all the cross-references and produces a single executable file. - Loading: When you run the executable, the operating system’s loader reads the file into memory, links any required shared libraries (
.so
), and creates a running process.
In practice, you usually drive this whole process with a single command, gcc
.
You can use flags to tell gcc
to stop after a specific stage, which is incredibly useful for debugging and understanding what the compiler is doing:
gcc -E foo.c
: Stop after preprocessing.gcc -S foo.c
: Stop after compiling to assembly.gcc -c foo.c
: Stop after assembling to an object file.gcc -o bar foo.c bar.c
: Do the whole thing, linkingfoo.c
andbar.c
into an executable namedbar
.
Summary
- C is a systems programming language! Its purpose is to program the system itself and to write high-performance code.
- Understanding C is about understanding how your program, the C compiler, and the computer system all interact with each other.
Control Flow in C
The good news is that C’s control flow statements are the blueprint for most modern languages. If you know Java, C#, or C++, this will look very familiar.
Conditional Statements
if (boolean_expression) {
// statement_when_true
} else {
// statement_when_false
}
switch (integer_expression) {
case CONSTANT_1:
// statement
break;
case CONSTANT_2:
// statement
break;
default:
// statement
break;
}
return (expression);
Note that switch
in C only works on integer expressions.
Loop Statements
for (initial; test; increment) {
// statement
}
while (boolean_expression) {
// statement
}
do {
// statement
} while (boolean_expression);
The C for
loop is just syntactic sugar for a while
loop. It’s not a “for-each” iterator like in Python or Java.
Jump Statements
break; // Exits the innermost loop or switch.
continue; // Skips to the next iteration of the innermost loop.
goto Label; // Unconditional jump. Controversial!
Unlike Java, break
and continue
in C cannot take a label to break out of nested loops.
Functions
Functions in C are similar to methods in Java. They have a name, a return type, a list of argument types, and a body.
// Compute factorial function
// fact(n) = n * (n-1) * ... * 2 * 1
int fact(int n)
{
if (n == 0) {
return(1);
} else {
return(n * fact(n-1));
}
}
The main
function is just a special function that serves as the program’s entry point. Its signature allows it to receive command-line arguments from the shell.
#include <stdio.h>
int main(int argc, char *argv[])
{
int i;
// Print arguments from command line
printf("argc = %d\n\n", argc);
for (i = 0; i < argc; ++i) {
printf("argv[%d]: %s\n", i, argv[i]);
}
return 0;
}
argc
(argument count) is an integer telling you how many strings are in theargv
array.argv
(argument vector) is an array of strings.argv[0]
is always the name of the program itself.
Basic I/O: printf()
printf
is your go-to function for printing output. It’s part of the standard library, not the language itself, but it’s universally available.
#include <stdio.h>
int main(int argc, char *argv[])
{
int i = 314;
const char s[] = "Mothy";
printf("My name is %s and I work in STF H %d\n", s, i);
return 0;
}
- The first argument is a format string.
- Special sequences starting with
%
are placeholders.%s
is for a string,%d
is for a decimal integer. There are many others (man 3 printf
is your friend). - The remaining arguments are the values to be substituted into the placeholders. They must match the type and order of the format specifiers.
printf
is a variadic function, meaning it can take a variable number of arguments.
Summary: Control Flow in C
- Functions:
return (...)
- Loops:
for(..;..;..)
,do .. while(..)
,while(..)
- Conditionals:
if (..) then .. else ..
,switch (..) case .. : ..; default ..
- Jumps:
break
,continue
,goto ..
- I/O:
printf()
goto
(or not)
The goto
statement allows for an unconditional jump to a labeled point in the code. In the 1970s, a fierce debate raged about its use, famously kicked off by Edsger Dijkstra’s letter “Go To Statement Considered Harmful.” The consensus was that it led to “spaghetti code” that was impossible to reason about.
When to use goto
- Don’t.
- It is almost never a good idea.
- Arguments for
goto
on performance grounds are almost always wrong with modern compilers. Aswitch
statement is just as fast for implementing a state machine.
There are, however, two rare and highly stylized situations where goto
is considered acceptable, even idiomatic, in C.
-
Early termination of multiple loops. Since C’s
break
can’t break out of nested loops, agoto
can be a cleaner alternative to using boolean flags.Here, the
goto found;
is arguably more direct and easier to read than managing thefound
flag and the extraif (found) break;
. It’s essentially emulating thebreak <label>;
feature from other languages. -
Nested cleanup code. This is the most common legitimate use of
goto
, especially in systems code. It’s used for recovery code.- General Idea: A function performs a sequence of operations (e.g., allocating resources). Any one of them can fail. If an operation fails, all previous successful operations must be undone in reverse order.
- Canonical Example:
malloc
’ing a sequence of buffers. If the thirdmalloc
fails, you mustfree
the first two before returning an error.
Trying to write this with structured
if-else
statements leads to deeply nested, unreadable code. Thegoto
pattern is highly stylized and much cleaner.This pattern is found all over robust systems code, like this example from the Linux kernel’s NFS implementation.
Summary: goto
- Don’t use
goto
! - Except maybe:
- For early termination of multiple loops.
- For nested cleanup code.
If you’re not sure if you should use goto
, the answer is no.
Basic Types in C
C’s type system is simple and maps closely to the underlying hardware.
Declarations and Scope
Declarations look like they do in Java. Their visibility, or scope, depends on where they are declared.
#include <stdio.h>
static int j = 0; // Scope: this file only. Value persists.
int func(int j) // Scope of this 'j': the function body.
{
static int i = 0; // Scope: this block only. Value persists across calls.
i = i + 1;
j = j + 1;
printf("In func: i=%d, j=%d\n", i, j);
return j;
}
int main(int argc, char *argv[])
{
int i = 0; // Scope: this block only.
printf("In main: i=%d, j=%d\n", i, j); // This 'j' is the global one.
func(j);
printf("In main: i=%d, j=%d\n", i, j);
func(j);
printf("In main: i=%d, j=%d\n", i, j);
return 0;
}
- Outside a block: A declaration is global to the entire program. The
static
keyword restricts its visibility to the current file (compilation unit). - Inside a block: A declaration is local to that block. The
static
keyword here does something different: it makes the variable’s storage permanent, so its value persists between calls to the function.
Integers and Floats
C provides several basic integer and floating-point types. A key, and sometimes frustrating, aspect of C is that the exact sizes of integer types are implementation defined.
- Integers are
signed
by default. You can use theunsigned
keyword to declare an integer that cannot be negative (e.g.,unsigned int
). - The sizes in the table are typical for modern systems, but not guaranteed.
int
is almost always 32 bits.long
is 64 bits on 64-bit systems. - Floats and doubles are standardized by the IEEE 754 standard.
float
is 32 bits anddouble
is 64 bits on all modern machines.
C99 Extended Integer Types
Because the built-in types are ambiguous, C99 introduced a standard header, <stdint.h>
, that provides types with precise widths. You should use these whenever you need an exact size.
The rules for arithmetic when mixing these different types are complicated. The behavior is generally determined by the hardware. We’ll cover this in detail later.
Booleans
Historically, C had no boolean type. It simply used integers:
- False →
0
- True → anything non-zero
- The negation operator (
!
) turns zero into non-zero (typically 1), and vice-versa.
C99 added a bool
type (via #include <stdbool.h>
), but it’s just syntactic sugar. Underneath, it’s still an integer.
A crucial C idiom stems from the fact that any statement in C is also an expression. This allows for compact code where a value is assigned and tested in the same if
condition.
// Idiom 1: Check for error return code
if ((rc = test(atoi(argv[1])))) { // Assigns to rc, then tests if rc is non-zero
fprintf(stderr, "Error: argument was out of range\n");
return 1;
}
// Idiom 2: Check for NULL pointer
if (!(f = fopen(argv[1], "r"))) { // Assigns to f, then tests if f is NULL (zero)
perror("Failed to open file");
return 1;
}
void
There is a type called void
.
- It has no value.
- It is used for two main purposes:
- Declaring functions that have no return value (procedures).
- Declaring untyped pointers (
void *
), which are pointers to raw memory whose type is unknown.
const
and enum
-
const
: Declaring a variableconst
means its value cannot be modified. The compiler will enforce this.$ cat const.c const int i = 0; int main(...) { i = i + 1; // This will cause a compiler error return i; } $ gcc -o const const.c const.c:5:7: error: assignment of read-only variable 'i'
This becomes very useful later with pointers.
-
enum
: Anenum
(enumeration) is a way to declare a set of named integer constants.$ cat enum.c #include <stdio.h> // Values start at 0 and increment: CAB=0, CNB=1, OAT=2, ... enum { CAB, CNB, OAT, STD, STF, WWA } dinfk_buildings; // Values can be assigned explicitly enum { CAB_N = 6, CNB_N = CAB_N + 1, STF_N = 114 } street_numbers; int main(...) { printf("CAB is %d; OAT is %d\n", CAB, OAT); printf("CNB_N is %d, STF_N is %d\n", CNB_N, STF_N); return 0; } $ ./enum CAB is 0; OAT is 2 CNB_N is 7, STF_N is 114
Summary: C Basic Types
We’ve covered the fundamentals of C’s type system:
- Declarations
- Scopes and
static
- Integers and floats, extended types
- Booleans
void
const
enum