02 Introduction to C

Now that we’ve set the stage and explored the mindset of a systems programmer, it’s time to introduce our primary tool: the C programming language. C is old, simple, and powerful. It’s the lingua franca of systems programming for a reason. It doesn’t hold your hand, but it gives you unparalleled control over the machine.

History and Toolchain

History

To understand C, it helps to know where it came from.

Developed 1969-1972 by Dennis Ritchie (and Brian Kernighan) at the legendary Bell Labs.
It didn’t appear in a vacuum. It has a clear lineage:
- CPL (Combined Programming Language, 1963) was a massive, complex language, famously considered “unimplementable.”
- BCPL (Basic CPL, 1967) was a radical simplification, stripping the language down to a single data type: the machine word. It was a kind of portable assembly.
- B (1969) was Ken Thompson’s adaptation of BCPL for the new Unix operating system.
- C was Ritchie’s successor to B, adding back a simple type system.
It was highly influenced by the DEC PDP-11 architecture, the machine Unix was being ported to. Some of C’s quirks make a lot more sense when you know the PDP-11’s instruction set.
Despite its specific origins, it was designed to be portable across many architectures, which was a key factor in its success and the success of Unix.

Standards

K&R C: The original “standard” was simply the language described in Kernighan and Ritchie’s book, The C Programming Language. The compiler’s source code was the ultimate specification!
ANSI C (C89/C90): The first formal standard.
C99: A major update that added many useful features. This is the version we’ll use in this course.
C11, C17: More recent updates, mostly adding minor features and bug fixes.
…and many C-like variants that have borrowed its syntax and philosophy.

C’s Enduring Popularity

More than 50 years on, C is still ubiquitous. Look at the TIOBE Programming Community Index, which tracks language popularity. C has consistently been at or near the top for decades.

Why does this old, seemingly primitive language persist?

Compared to Java, C#, PHP, Python, etc…

C offers a unique set of trade-offs that keep it relevant.

It is very fast.
- A good C compiler generates highly optimized machine code. It’s almost impossible to write assembly by hand that’s consistently faster.
- It’s also pretty much impossible to compile a high-level managed language like Java to run as fast as C. The layers of abstraction (JVM, garbage collection) have a cost.
It has a powerful macro pre-processor (cpp). This is a text-substitution tool that runs before the compiler proper. It’s a blunt instrument, but it enables a lot of powerful idioms.
It is close to the metal. This is the killer feature. When you look at a line of C, you can know what the code is doing to the hardware. There are no hidden mechanisms or unpredictable runtime behaviors.

Because of these characteristics, C remains the language of choice for:

Operating System developers
Embedded systems programmers
People who really care about speed (high-performance computing, game engines)
Authors of security exploits (who need precise control over the machine’s state)

What You Don’t Get

C’s power comes from its simplicity, which means it lacks many features you might take for granted.

No objects, classes, traits, features, methods, or interfaces. C is a procedural language. You have data and you have functions that operate on that data. That’s it. (We’ll see function pointers later, which let you build object-like systems yourself).
No fancy built-in types. There’s no built-in string or list type. The types provided are mostly just what the hardware provides (integers and floats of various sizes). You use type constructors (struct) to build more complex types yourself.
No exception handling. There is no try/catch. The universal convention is to use integer return codes: a function returns 0 for success and a non-zero value to indicate an error.

The Most Important Difference

If there’s one thing to take away about what makes C different, it’s this:

No automatic memory management.
- There is no garbage collection.
- Memory is either allocated on the stack (and disappears when a function returns) or on the heap.
- Heap structures must be explicitly created and freed by the programmer. This is a huge source of bugs, but also a huge source of C’s performance and predictability.
Pointers: direct access to memory addresses. A pointer is not a “reference”; it’s a variable that holds a memory address, a raw number. It’s weakly typed by what it points to, but you can make it point to anything.

Success

C is about directly building and manipulating structures in main memory! This is the fundamental mental model. You’re not working with abstract objects; you’re arranging bytes in memory.

Syntax: The Good News & The Bad News

The good news is that C’s syntax is probably already familiar to you.

The syntax of Java, JavaScript, C++, and C# was almost entirely lifted from C.
Comments (/* ... */, //) are the same.
Identifiers are mostly the same as in Java.
Block structure using { ... } is the C way.

The main differences are subtle but important:

The list of reserved words is different.
C is run through a macro preprocessor. This is a separate step that performs text substitution, file inclusion, and conditional compilation before the actual compiler sees the code. C# has preprocessor directives, but it’s not a separate, text-based stage like in C.

Hello, World!

Here is the canonical first program in C. It’s a good way to see the basic structure.

#include <stdio.h>
 
int main(int argc, char *argv[])
{
    printf("hello, world\n");
    return 0;
}

Let’s break it down:

#include <stdio.h>: This is a preprocessor directive. It tells the preprocessor to find the “header file” stdio.h and paste its contents right here. This file contains declarations for standard input/output functions, like printf. It’s a bit like an interface file in Java or C#.
int main(...): Every C program must have a main function. This is the entry point where execution begins. It takes a list of command-line arguments and returns an integer status code.
printf(...): This is a standard library function for printing formatted strings. The \n is an escape sequence for a newline character; unlike some languages, newlines are not added automatically.
return 0;: This exits the main function and returns a status code of 0 to the operating system. By convention, 0 means “everything is OK.” Since C has no exceptions, this is the primary way to signal success or failure.

The C Toolchain: From Source to Execution

How does that simple text file become a running program? It’s a multi-stage process called the toolchain. In C, these stages are explicit, and you can stop the process at any point to inspect the intermediate results.

Editing: You start by writing source code in text files, typically with .c (source) and .h (header) extensions.
Preprocessing (cpp): The preprocessor (cpp) takes a .c file, handles all the #include, #define, and other directives, and produces a single, large, pre-processed C source file (often with a .i extension).
Compilation (cc1): The compiler proper (cc1) takes the .i file and translates it into assembly language for your target architecture (a .s file).
Assembly (as): The assembler (as) takes the .s file and translates it into a machine-code object file (.o). This file contains the raw binary instructions and data, but it’s not yet executable. It has unresolved references to functions and variables in other files.
Linking (ld): The linker (ld) takes one or more object files (.o), along with static libraries (.a), and combines them. It resolves all the cross-references and produces a single executable file.
Loading: When you run the executable, the operating system’s loader reads the file into memory, links any required shared libraries (.so), and creates a running process.

In practice, you usually drive this whole process with a single command, gcc.

You can use flags to tell gcc to stop after a specific stage, which is incredibly useful for debugging and understanding what the compiler is doing:

gcc -E foo.c: Stop after preprocessing.
gcc -S foo.c: Stop after compiling to assembly.
gcc -c foo.c: Stop after assembling to an object file.
gcc -o bar foo.c bar.c: Do the whole thing, linking foo.c and bar.c into an executable named bar.

Summary

C is a systems programming language! Its purpose is to program the system itself and to write high-performance code.
Understanding C is about understanding how your program, the C compiler, and the computer system all interact with each other.

Control Flow in C

The good news is that C’s control flow statements are the blueprint for most modern languages. If you know Java, C#, or C++, this will look very familiar.

Conditional Statements

if (boolean_expression) {
    // statement_when_true
} else {
    // statement_when_false
}
 
switch (integer_expression) {
    case CONSTANT_1:
        // statement
        break;
    case CONSTANT_2:
        // statement
        break;
    default:
        // statement
        break;
}
 
return (expression);

Note that switch in C only works on integer expressions.

Loop Statements

for (initial; test; increment) {
    // statement
}
 
while (boolean_expression) {
    // statement
}
 
do {
    // statement
} while (boolean_expression);

The C for loop is just syntactic sugar for a while loop. It’s not a “for-each” iterator like in Python or Java.

Jump Statements

break;      // Exits the innermost loop or switch.
continue;   // Skips to the next iteration of the innermost loop.
goto Label; // Unconditional jump. Controversial!

Unlike Java, break and continue in C cannot take a label to break out of nested loops.

Functions

Functions in C are similar to methods in Java. They have a name, a return type, a list of argument types, and a body.

// Compute factorial function
// fact(n) = n * (n-1) * ... * 2 * 1
int fact(int n)
{
    if (n == 0) {
        return(1);
    } else {
        return(n * fact(n-1));
    }
}

The main function is just a special function that serves as the program’s entry point. Its signature allows it to receive command-line arguments from the shell.

#include <stdio.h>
 
int main(int argc, char *argv[])
{
    int i;
 
    // Print arguments from command line
    printf("argc = %d\n\n", argc);
    for (i = 0; i < argc; ++i) {
        printf("argv[%d]: %s\n", i, argv[i]);
    }
    return 0;
}

argc (argument count) is an integer telling you how many strings are in the argv array.
argv (argument vector) is an array of strings. argv[0] is always the name of the program itself.

Basic I/O: `printf()`

printf is your go-to function for printing output. It’s part of the standard library, not the language itself, but it’s universally available.

#include <stdio.h>
 
int main(int argc, char *argv[])
{
    int i = 314;
    const char s[] = "Mothy";
    printf("My name is %s and I work in STF H %d\n", s, i);
    return 0;
}

The first argument is a format string.
Special sequences starting with % are placeholders. %s is for a string, %d is for a decimal integer. There are many others (man 3 printf is your friend).
The remaining arguments are the values to be substituted into the placeholders. They must match the type and order of the format specifiers.
printf is a variadic function, meaning it can take a variable number of arguments.

Summary: Control Flow in C

Functions: return (...)
Loops: for(..;..;..), do .. while(..), while(..)
Conditionals: if (..) then .. else .., switch (..) case .. : ..; default ..
Jumps: break, continue, goto ..
I/O: printf()

`goto` (or not)

The goto statement allows for an unconditional jump to a labeled point in the code. In the 1970s, a fierce debate raged about its use, famously kicked off by Edsger Dijkstra’s letter “Go To Statement Considered Harmful.” The consensus was that it led to “spaghetti code” that was impossible to reason about.

When to use `goto`

Don’t.
- It is almost never a good idea.
- Arguments for goto on performance grounds are almost always wrong with modern compilers. A switch statement is just as fast for implementing a state machine.

There are, however, two rare and highly stylized situations where goto is considered acceptable, even idiomatic, in C.

Early termination of multiple loops. Since C’s break can’t break out of nested loops, a goto can be a cleaner alternative to using boolean flags.

Here, the goto found; is arguably more direct and easier to read than managing the found flag and the extra if (found) break;. It’s essentially emulating the break <label>; feature from other languages.
Nested cleanup code. This is the most common legitimate use of goto, especially in systems code. It’s used for recovery code.
- General Idea: A function performs a sequence of operations (e.g., allocating resources). Any one of them can fail. If an operation fails, all previous successful operations must be undone in reverse order.
- Canonical Example: malloc’ing a sequence of buffers. If the third malloc fails, you must free the first two before returning an error.
Trying to write this with structured if-else statements leads to deeply nested, unreadable code. The goto pattern is highly stylized and much cleaner.

This pattern is found all over robust systems code, like this example from the Linux kernel’s NFS implementation.

Summary: `goto`

Don’t use goto!
Except maybe:
- For early termination of multiple loops.
- For nested cleanup code.

If you’re not sure if you should use goto, the answer is no.

Basic Types in C

C’s type system is simple and maps closely to the underlying hardware.

Declarations and Scope

Declarations look like they do in Java. Their visibility, or scope, depends on where they are declared.

#include <stdio.h>
 
static int j = 0; // Scope: this file only. Value persists.
 
int func(int j) // Scope of this 'j': the function body.
{
    static int i = 0; // Scope: this block only. Value persists across calls.
 
    i = i + 1;
    j = j + 1;
    printf("In func: i=%d, j=%d\n", i, j);
    return j;
}
 
int main(int argc, char *argv[])
{
    int i = 0; // Scope: this block only.
 
    printf("In main: i=%d, j=%d\n", i, j); // This 'j' is the global one.
    func(j);
    printf("In main: i=%d, j=%d\n", i, j);
    func(j);
    printf("In main: i=%d, j=%d\n", i, j);
    return 0;
}

Outside a block: A declaration is global to the entire program. The static keyword restricts its visibility to the current file (compilation unit).
Inside a block: A declaration is local to that block. The static keyword here does something different: it makes the variable’s storage permanent, so its value persists between calls to the function.

Integers and Floats

C provides several basic integer and floating-point types. A key, and sometimes frustrating, aspect of C is that the exact sizes of integer types are implementation defined.

Integers are signed by default. You can use the unsigned keyword to declare an integer that cannot be negative (e.g., unsigned int).
The sizes in the table are typical for modern systems, but not guaranteed. int is almost always 32 bits. long is 64 bits on 64-bit systems.
Floats and doubles are standardized by the IEEE 754 standard. float is 32 bits and double is 64 bits on all modern machines.

C99 Extended Integer Types

Because the built-in types are ambiguous, C99 introduced a standard header, <stdint.h>, that provides types with precise widths. You should use these whenever you need an exact size.

The rules for arithmetic when mixing these different types are complicated. The behavior is generally determined by the hardware. We’ll cover this in detail later.

Booleans

Historically, C had no boolean type. It simply used integers:

False → 0
True → anything non-zero
The negation operator (!) turns zero into non-zero (typically 1), and vice-versa.

C99 added a bool type (via #include <stdbool.h>), but it’s just syntactic sugar. Underneath, it’s still an integer.

A crucial C idiom stems from the fact that any statement in C is also an expression. This allows for compact code where a value is assigned and tested in the same if condition.

// Idiom 1: Check for error return code
if ((rc = test(atoi(argv[1])))) { // Assigns to rc, then tests if rc is non-zero
    fprintf(stderr, "Error: argument was out of range\n");
    return 1;
}
 
// Idiom 2: Check for NULL pointer
if (!(f = fopen(argv[1], "r"))) { // Assigns to f, then tests if f is NULL (zero)
    perror("Failed to open file");
    return 1;
}

`void`

There is a type called void.

It has no value.
It is used for two main purposes:
1. Declaring functions that have no return value (procedures).
2. Declaring untyped pointers (void *), which are pointers to raw memory whose type is unknown.

`const` and `enum`

const: Declaring a variable const means its value cannot be modified. The compiler will enforce this.

$ cat const.c
const int i = 0;
int main(...) {
    i = i + 1; // This will cause a compiler error
    return i;
}
$ gcc -o const const.c
const.c:5:7: error: assignment of read-only variable 'i'

This becomes very useful later with pointers.

enum: An enum (enumeration) is a way to declare a set of named integer constants.

$ cat enum.c
#include <stdio.h>
// Values start at 0 and increment: CAB=0, CNB=1, OAT=2, ...
enum { CAB, CNB, OAT, STD, STF, WWA } dinfk_buildings;
 
// Values can be assigned explicitly
enum { CAB_N = 6, CNB_N = CAB_N + 1, STF_N = 114 } street_numbers;
 
int main(...) {
    printf("CAB is %d; OAT is %d\n", CAB, OAT);
    printf("CNB_N is %d, STF_N is %d\n", CNB_N, STF_N);
    return 0;
}
$ ./enum
CAB is 0; OAT is 2
CNB_N is 7, STF_N is 114

Summary: C Basic Types

We’ve covered the fundamentals of C’s type system:

Declarations
Scopes and static
Integers and floats, extended types
Booleans
void
const
enum

CS Notes

Explorer

02 Introduction to C

History and Toolchain

History

Standards

C’s Enduring Popularity

Compared to Java, C#, PHP, Python, etc…

What You Don’t Get

The Most Important Difference

Syntax: The Good News & The Bad News

Hello, World!

The C Toolchain: From Source to Execution

Summary

Control Flow in C

Conditional Statements

Loop Statements

Jump Statements

Functions

Basic I/O: `printf()`

Summary: Control Flow in C

`goto` (or not)

When to use `goto`

Summary: `goto`

Basic Types in C

Declarations and Scope

Integers and Floats

C99 Extended Integer Types

Booleans

`void`

`const` and `enum`

Summary: C Basic Types

Table of Contents

Graph View

Backlinks

CS Notes

Explorer

02 Introduction to C

History and Toolchain

History

Standards

C’s Enduring Popularity

Compared to Java, C#, PHP, Python, etc…

What You Don’t Get

The Most Important Difference

Syntax: The Good News & The Bad News

Hello, World!

The C Toolchain: From Source to Execution

Summary

Control Flow in C

Conditional Statements

Loop Statements

Jump Statements

Functions

Basic I/O: printf()

Summary: Control Flow in C

goto (or not)

When to use goto

Summary: goto

Basic Types in C

Declarations and Scope

Integers and Floats

C99 Extended Integer Types

Booleans

void

const and enum

Summary: C Basic Types

Table of Contents

Graph View

Backlinks

Basic I/O: `printf()`

`goto` (or not)

When to use `goto`

Summary: `goto`

`void`

`const` and `enum`