04 Kolmogorov Complexity and the Nature of Randomness

In our last lecture, we introduced a powerful concept: Kolmogorov Complexity, $K (x)$ , which defines the information content of a string $x$ as the length of the shortest program that generates it.

A key takeaway is that we typically analyze $K (x)$ not for a single, specific string, but for an infinite sequence of strings. This is because for any single string, the complexity is subject to a “dirty constant”—an additive factor that depends on the chosen programming language. By looking at the asymptotic behavior for longer and longer strings, this constant becomes negligible, and the true information content emerges.

Let’s warm up with an exercise to solidify this idea.

Training: Bounding Complexity for Structured Sequences

Consider the set of words of the form $x = 0^{N}$ , where the length $N$ is a number of the form $N = 2^{i} \cdot 3^{j} \cdot 5^{k}$ for some natural numbers $i, j, k$ . How can we find an upper bound on $K (x)$ ?

The strategy is always the same: design a short program that generates $x$ .

// Program to generate x = 0^N where N = 2^i * 3^j * 5^k
function generate_x(i, j, k):
  N = 2^i * 3^j * 5^k
  print('0' * N)

To find the Kolmogorov complexity of a specific $x$ , we need the shortest program that generates it. Our program generate_x needs the specific values of $i, j, k$ for that $x$ . So, the program that generates $x$ would look like:

// A specific program for a specific x
i = 10   // The specific exponent for 2
j = 7    // The specific exponent for 3
k = 12   // The specific exponent for 5
N = 2^i * 3^j * 5^k
print('0' * N)

The length of this program is the length of the code template (a constant, $c$ ) plus the length of the binary representations of $i, j,$ and $k$ .

How large can $i, j, k$ be? Since $N = ∣ x ∣$ , we have $2^{i} \leq N$ , which means $i \leq lo g_{2} N$ . Similarly, $j \leq lo g_{3} N$ and $k \leq lo g_{5} N$ . In all cases, the exponents are at most logarithmic in the length of $x$ .

The number of bits needed to represent a number like $i$ is roughly $lo g_{2} (i)$ , so the number of bits for our exponents is roughly $lo g_{2} (lo g_{2} N)$ .

Therefore, we can bound the complexity:

K (x) \leq c + O (lo g lo g ∣ x ∣)

This shows that numbers with this specific prime factorization structure are highly compressible and have very low information content.

Defining Randomness

Kolmogorov complexity gives us something that probability theory cannot: a way to define what it means for a single object to be random. The intuition is simple and beautiful:

An object is random if it has no pattern. The shortest way to describe it is to present the object itself.

In our formal language:

A binary string $x$ is random if it is incompressible. That is,
$K (x) \geq ∣ x ∣$

A random string has no structure that a program could exploit to generate it from a shorter description.

Do Random Strings Exist?

It’s a fair question. Maybe every string has some hidden pattern that allows for compression. A simple counting argument shows this is not the case.

Lemma 2.5: For every natural number $n$ , there exists a random binary string of length $n$ .

Proof (by counting)

Let’s count two things:

The number of strings of length $n$ : There are exactly $2^{n}$ of them.
The number of short descriptions: A description is a program. A “short” description is a program of length less than $n$ . The number of possible binary strings (and thus, programs) of length less than $n$ is: $i = 0 \sum n - 1 2^{i} = 2^{0} + 2^{1} + \dots + 2^{n - 1} = 2^{n} - 1$

We have $2^{n}$ strings to describe, but only $2^{n} - 1$ possible short descriptions. By the pigeonhole principle, there must be at least one string of length $n$ that has no description shorter than $n$ . That string is, by definition, random.

In fact, we can make a much stronger statement: most strings are random. A similar counting argument shows that at least half of all binary strings are random. Randomness is the norm, not the exception.

The Invariance Theorem: Robustness of the Definition

A crucial question arises: does $K (x)$ depend on our choice of programming language? If $K_{P y t h o n} (x)$ is different from $K_{J a v a} (x)$ , the definition seems arbitrary.

The Invariance Theorem states that the choice of any universal programming language does not significantly change the Kolmogorov complexity.

Theorem

For any two universal programming languages $A$ and $B$ , there exists a constant $c_{A \to B}$ such that for all strings $x$ :

K_{B} (x) \leq K_{A} (x) + c_{A \to B}

Proof Idea

We can write a program in language $B$ that acts as an interpreter for language $A$ . This interpreter is a fixed program; its length is our constant $c_{A \to B}$ .

To generate $x$ using language $B$ , we can construct a program that consists of two parts:

The interpreter for language $A$ (size $c_{A \to B}$ ).
The shortest program for $x$ in language $A$ (size $K_{A} (x)$ ).

This combined program, written in language $B$ , first interprets and then executes the program from language $A$ , ultimately producing $x$ . Its total length is $K_{A} (x) + c_{A \to B}$ . Since $K_{B} (x)$ is the length of the shortest program in $B$ , it must be less than or equal to this length.

This is a powerful result. It means that while the absolute value of $K (x)$ might shift by a constant, the concept itself is robust and independent of the specific computational model, as long as that model is universal (i.e., can simulate any other model).

Application: A New Proof for the Infinitude of Primes

Kolmogorov complexity is not just a philosophical curiosity; it’s a powerful proof tool. Let’s use it to prove a cornerstone of mathematics: there are infinitely many prime numbers.

Proof (by contradiction)

Assumption: Assume there are only a finite number of primes: $p_{1}, p_{2}, \dots, p_{k}$ .
Representation: Any natural number $n_{i}$ can be uniquely represented by its prime factorization using this finite set of primes:
$n_{i} = p_{1}^{e_{i 1}} \cdot p_{2}^{e_{i 2}} \dots p_{k}^{e_{ik}}$
This means we can fully describe the number $n_{i}$ by providing the list of its $k$ exponents: $(e_{i 1}, e_{i 2}, \dots, e_{ik})$ .
Compression: Let’s build a program to generate $n_{i}$ . The program needs the list of exponents as input. The primes $p_{1}, \dots, p_{k}$ are fixed and can be hardcoded into the program.
- How large are the exponents? Since $2^{e_{ij}} \leq p_{j}^{e_{ij}} \leq n_{i}$ , we know that each exponent $e_{ij}$ is at most $lo g_{2} (n_{i})$ .
- The number of bits needed to represent each exponent is therefore about $lo g_{2} (lo g_{2} n_{i})$ .
- Since there are a fixed number ( $k$ ) of exponents, the total length of the description for $n_{i}$ is a constant (for the program logic) plus $k \cdot O (lo g lo g n_{i})$ .
- This implies that for any number $n_{i}$ , its Kolmogorov complexity is bounded by: $K (n_{i}) \leq c + O (lo g lo g n_{i})$
Contradiction: We know that there exist random numbers. For any constant $c$ , we can find a sufficiently large random number $n_{i}$ such that:
$K (n_{i}) \geq ∣ bin (n_{i}) ∣ \approx lo g_{2} n_{i}$
So we have two bounds on $K (n_{i})$ :
$lo g_{2} n_{i} \leq K (n_{i}) \leq c + O (lo g lo g n_{i})$
The function $lo g n$ grows much faster than $lo g lo g n$ . For a large enough $n_{i}$ , this inequality cannot possibly hold. This is a contradiction.
Conclusion: Our initial assumption—that there are only finitely many primes—must be false.

The Uncomputability of Kolmogorov Complexity

We have a beautiful, robust definition of information. Now for the catch:

Theorem

The function $K (x)$ is not computable. There is no algorithm that takes a string $x$ as input and outputs the integer $K (x)$ .

Proof (by contradiction)

Assumption: Assume there exists an algorithm ComputeK(x) that calculates $K (x)$ .
Construction: We can use this algorithm to build a new program, FindComplexString(n), that does the following:
- It takes an integer $n$ as input.
- It generates all binary strings $y_{1}, y_{2}, y_{3}, \dots$ in canonical order.
- For each string $y_{i}$ , it calls ComputeK(y_i).
- It stops and outputs the first string $y_{i}$ it finds for which ComputeK(y_i) returns a value $\geq n$ . Let’s call this string $x_{n}$ .
Analysis: The program FindComplexString(n) generates the string $x_{n}$ . What is the Kolmogorov complexity of $x_{n}$ ? It’s the length of the shortest program that generates it. Our program FindComplexString is one such program.
- The code for FindComplexString is fixed; its length is a constant $c$ .
- The only input it needs is the integer $n$ . The length of the binary representation of $n$ is about $lo g_{2} n$ .
- Therefore, we have an upper bound on the complexity of $x_{n}$ : $K (x_{n}) \leq c + lo g_{2} n$
Contradiction: By its very construction, $x_{n}$ is a string whose complexity is at least $n$ :
$K (x_{n}) \geq n$
Combining our two findings, we get:
$n \leq K (x_{n}) \leq c + lo g_{2} n$
For any fixed constant $c$ , we can choose an $n$ large enough that $n > c + lo g_{2} n$ . This is a contradiction.
Conclusion: Our initial assumption must be false. No algorithm to compute $K (x)$ can exist.

This is a profound and somewhat unsettling result. We have found what seems to be the “correct” definition of information and randomness, but it is a concept that we can reason about but never fully calculate.

CS Notes

Explorer

04 Kolmogorov Complexity and the Nature of Randomness

Training: Bounding Complexity for Structured Sequences

Defining Randomness

Do Random Strings Exist?

Proof (by counting)

The Invariance Theorem: Robustness of the Definition

Theorem

Proof Idea

Application: A New Proof for the Infinitude of Primes

Proof (by contradiction)

The Uncomputability of Kolmogorov Complexity

Theorem

Proof (by contradiction)

Table of Contents

Graph View