In our last lecture, we introduced a powerful concept: Kolmogorov Complexity, , which defines the information content of a string as the length of the shortest program that generates it.

A key takeaway is that we typically analyze not for a single, specific string, but for an infinite sequence of strings. This is because for any single string, the complexity is subject to a “dirty constant”—an additive factor that depends on the chosen programming language. By looking at the asymptotic behavior for longer and longer strings, this constant becomes negligible, and the true information content emerges.

Let’s warm up with an exercise to solidify this idea.

Training: Bounding Complexity for Structured Sequences

Consider the set of words of the form , where the length is a number of the form for some natural numbers . How can we find an upper bound on ?

The strategy is always the same: design a short program that generates .

// Program to generate x = 0^N where N = 2^i * 3^j * 5^k
function generate_x(i, j, k):
  N = 2^i * 3^j * 5^k
  print('0' * N)

To find the Kolmogorov complexity of a specific , we need the shortest program that generates it. Our program generate_x needs the specific values of for that . So, the program that generates would look like:

// A specific program for a specific x
i = 10   // The specific exponent for 2
j = 7    // The specific exponent for 3
k = 12   // The specific exponent for 5
N = 2^i * 3^j * 5^k
print('0' * N)

The length of this program is the length of the code template (a constant, ) plus the length of the binary representations of and .

How large can be? Since , we have , which means . Similarly, and . In all cases, the exponents are at most logarithmic in the length of .

The number of bits needed to represent a number like is roughly , so the number of bits for our exponents is roughly .

Therefore, we can bound the complexity:

This shows that numbers with this specific prime factorization structure are highly compressible and have very low information content.

Defining Randomness

Kolmogorov complexity gives us something that probability theory cannot: a way to define what it means for a single object to be random. The intuition is simple and beautiful:

An object is random if it has no pattern. The shortest way to describe it is to present the object itself.

In our formal language:

A binary string is random if it is incompressible. That is,

A random string has no structure that a program could exploit to generate it from a shorter description.

Do Random Strings Exist?

It’s a fair question. Maybe every string has some hidden pattern that allows for compression. A simple counting argument shows this is not the case.

Lemma 2.5: For every natural number , there exists a random binary string of length .

Proof (by counting)

Let’s count two things:

  1. The number of strings of length : There are exactly of them.
  2. The number of short descriptions: A description is a program. A “short” description is a program of length less than . The number of possible binary strings (and thus, programs) of length less than is:

We have strings to describe, but only possible short descriptions. By the pigeonhole principle, there must be at least one string of length that has no description shorter than . That string is, by definition, random.

In fact, we can make a much stronger statement: most strings are random. A similar counting argument shows that at least half of all binary strings are random. Randomness is the norm, not the exception.

The Invariance Theorem: Robustness of the Definition

A crucial question arises: does depend on our choice of programming language? If is different from , the definition seems arbitrary.

The Invariance Theorem states that the choice of any universal programming language does not significantly change the Kolmogorov complexity.

Theorem

For any two universal programming languages and , there exists a constant such that for all strings :

Proof Idea

We can write a program in language that acts as an interpreter for language . This interpreter is a fixed program; its length is our constant .

To generate using language , we can construct a program that consists of two parts:

  1. The interpreter for language (size ).
  2. The shortest program for in language (size ).

This combined program, written in language , first interprets and then executes the program from language , ultimately producing . Its total length is . Since is the length of the shortest program in , it must be less than or equal to this length.

This is a powerful result. It means that while the absolute value of might shift by a constant, the concept itself is robust and independent of the specific computational model, as long as that model is universal (i.e., can simulate any other model).

Application: A New Proof for the Infinitude of Primes

Kolmogorov complexity is not just a philosophical curiosity; it’s a powerful proof tool. Let’s use it to prove a cornerstone of mathematics: there are infinitely many prime numbers.

Proof (by contradiction)

  1. Assumption: Assume there are only a finite number of primes: .

  2. Representation: Any natural number can be uniquely represented by its prime factorization using this finite set of primes:

    This means we can fully describe the number by providing the list of its exponents: .

  3. Compression: Let’s build a program to generate . The program needs the list of exponents as input. The primes are fixed and can be hardcoded into the program.

    • How large are the exponents? Since , we know that each exponent is at most .
    • The number of bits needed to represent each exponent is therefore about .
    • Since there are a fixed number () of exponents, the total length of the description for is a constant (for the program logic) plus .
    • This implies that for any number , its Kolmogorov complexity is bounded by:
  4. Contradiction: We know that there exist random numbers. For any constant , we can find a sufficiently large random number such that:

    So we have two bounds on :

    The function grows much faster than . For a large enough , this inequality cannot possibly hold. This is a contradiction.

  5. Conclusion: Our initial assumption—that there are only finitely many primes—must be false.

The Uncomputability of Kolmogorov Complexity

We have a beautiful, robust definition of information. Now for the catch:

Theorem

The function is not computable. There is no algorithm that takes a string as input and outputs the integer .

Proof (by contradiction)

  1. Assumption: Assume there exists an algorithm ComputeK(x) that calculates .

  2. Construction: We can use this algorithm to build a new program, FindComplexString(n), that does the following:

    • It takes an integer as input.
    • It generates all binary strings in canonical order.
    • For each string , it calls ComputeK(y_i).
    • It stops and outputs the first string it finds for which ComputeK(y_i) returns a value . Let’s call this string .
  3. Analysis: The program FindComplexString(n) generates the string . What is the Kolmogorov complexity of ? It’s the length of the shortest program that generates it. Our program FindComplexString is one such program.

    • The code for FindComplexString is fixed; its length is a constant .
    • The only input it needs is the integer . The length of the binary representation of is about .
    • Therefore, we have an upper bound on the complexity of :
  4. Contradiction: By its very construction, is a string whose complexity is at least :

    Combining our two findings, we get:

    For any fixed constant , we can choose an large enough that . This is a contradiction.

  5. Conclusion: Our initial assumption must be false. No algorithm to compute can exist.

This is a profound and somewhat unsettling result. We have found what seems to be the “correct” definition of information and randomness, but it is a concept that we can reason about but never fully calculate.