We’ve established our basic vocabulary: alphabets, words, and languages. Now, let’s use these tools to formally describe what an algorithmic problem is.

You might wonder why this is necessary. Isn’t it enough to say “the problem of multiplying two numbers” or “the problem of finding the shortest tour”? It turns out, that’s not nearly precise enough. The way we represent the input to a problem is a fundamental part of the problem itself.

Why Encoding Matters

As computer scientists, we don’t just solve problems; we process information. The form that information takes can drastically change a problem’s difficulty.

Example 1: Multiplication

Problem: Multiply two numbers.
Encoding 1: Decimal Representation. If the numbers are given as strings of digits (e.g., “123”, “456”), the standard schoolbook algorithm takes roughly quadratic time in the number of digits. More advanced algorithms using Fourier transforms can do better, but it’s still a non-trivial task.
Encoding 2: Prime Factorization. What if the numbers were given as a list of their prime factors? For example, $12 = 2^{2} \cdot 3$ and $10 = 2 \cdot 5$ . To multiply them, you just add the exponents of the common primes: $120 = 2^{3} \cdot 3 \cdot 5$ . The hard work of multiplication becomes the simple work of addition.

Example 2: Factorization

Problem: Find the prime factors of a number.
Encoding 1: Decimal Representation. Given “1147”, finding that it’s $31 \times 37$ is hard. For large numbers, this is the basis of modern cryptography. It’s considered a computationally “hard” problem.
Encoding 2: Prime Factorization. If the input is already given as " $31 \times 37$ ", the problem is trivial. You just read the answer.

The lesson is clear: an algorithm’s job is to make implicit information explicit. The difficulty of this task depends entirely on how well the information is hidden in the input encoding. Therefore, a formal problem definition must always specify the encoding scheme.

Types of Problems

With our informal definition of an algorithm as a program that always halts and produces a unique output, we can see that any algorithm realizes a function: it maps an input word to an output word.

A : Σ_{in}^{*} \to Σ_{o u t}^{*}

Let’s categorize the main types of problems we’ll encounter.

1. Decision Problems

These are the simplest and most fundamental problems, with a simple “yes” or “no” answer.

Formal Definition: A decision problem is defined by a pair $(Σ, L)$ , where $Σ$ is an input alphabet and $L \subseteq Σ^{*}$ is a language.
The Task: Given an input word $x \in Σ^{*}$ , decide if $x \in L$ .
An Algorithm $A$ solves the problem if: $A (x) = {10 if x \in L if x \in / L$
Terminology: We say that the algorithm $A$ recognizes the language $L$ . A language $L$ for which such an algorithm exists is called recursive. This is a historical term; for now, just think of it as meaning “algorithmically decidable.”

2. Function Problems

Here, the goal is to compute an output based on an input.

Formal Definition: A function problem is defined by a function $f : Σ^{*} \to Γ^{*}$ .
The Task: Given an input word $x \in Σ^{*}$ , compute the output word $f (x)$ .
An Algorithm $A$ computes the function $f$ if: for every $x$ in the domain of $f$ , $A (x) = f (x)$ .

3. Relation Problems

This is a generalization of function problems. Sometimes, there isn’t a single correct answer, but a set of acceptable ones.

Formal Definition: A relation problem is defined by a relation $R \subseteq Σ^{*} \times Γ^{*}$ .
The Task: Given an input $x \in Σ^{*}$ , find any output $y \in Γ^{*}$ such that the pair $(x, y)$ is in the relation $R$ .
Examples:
- Factoring: Given a composite number $x$ , find a non-trivial factor $y$ . There might be many, but finding any one of them solves the problem.
- Optimization (e.g., TSP): Given a graph $x$ , find an optimal tour $y$ . There can be exponentially many optimal tours; we are happy with any single one.

4. Generation and Enumeration Problems

Sometimes the task is not to process an input, but to create something from scratch.

Generation Problem: The algorithm takes no input (or an empty input $λ$ ) and produces a single, specific word $x$ .
Enumeration Problem: For an infinite language $L$ , the algorithm takes an integer $i$ as input and outputs the $i$ -th word of $L$ (according to the canonical ordering). For example, an algorithm that, given $i = 5$ , outputs the 5th prime number, “11”.

What is Information?

We’ve set up the framework for problems. Now let’s tackle a deeper, more philosophical question that lies at the heart of computer science: What is information?

This concept is fundamental not just to us, but to physics (quantum mechanics is a theory of information processing), biology (DNA is a program for building an organism), and the humanities (information as knowledge transfer).

The humanities often use a subjective definition: information is what you don’t already know. This is useless for building a formal theory. We need an objective measure. When we look at a string of bits, we want to be able to say, “This string contains X amount of information,” regardless of who is looking at it.

The Link to Compression

The key intuition, developed over centuries, is that information content is deeply connected to compressibility.

A message with low information content should be highly compressible. It’s simple, regular, and predictable.
A message with high information content should be incompressible. It’s complex, irregular, and random-looking.

Shannon’s Entropy: Information in a Random Source

The first great mathematical theory of information was developed by Claude Shannon. His idea was to measure the information content not of a single object, but of the random source that produces it.

Imagine a text composed of four letters: A, B, C, D.

Frequencies: Suppose in a long text, ‘A’ appears 50% of the time, ‘B’ 25%, and ‘C’ and ‘D’ 12.5% each.
Encoding: To encode this text in binary, a naive approach uses a fixed length for each letter (e.g., A=00, B=01, C=10, D=11), requiring 2 bits per character.
A Better Idea (Huffman Coding): Why waste bits? Let’s give the most frequent character the shortest code. A prefix-free code could be: A=0, B=10, C=110, D=111.
Expected Length: The average number of bits per character is now $0.5 \times 1 + 0.25 \times 2 + 0.125 \times 3 + 0.125 \times 3 = 1.75$ bits. We’ve compressed the text.

Shannon’s entropy formula quantifies this exact idea. It gives the theoretical lower bound on the average number of bits per symbol needed to encode messages from a given random source.

H = - x \in Alphabet \sum p (x) lo g_{2} p (x)

The Limitation: Shannon’s theory is brilliant, but it has a crucial blind spot. It only cares about the statistical frequency of symbols, not their structure. Consider these two strings, both of length 32 with the same character counts as our example:

AAAAAAAAAAAAAAAABBBBBBBBCCCCDDDD
ACBDAB... (a random-looking sequence)

To Shannon’s entropy, these two strings are identical. But intuitively, the first string contains almost no information. We can describe it simply as “16 A’s, 8 B’s, 4 C’s, 4 D’s”. The second string seems to contain much more information; there’s no obvious pattern, so we might have to write it out in full.

Shannon measures the information of the source, not the object. We need something more.

Kolmogorov Complexity: The Information in a Single Object

In the 1960s, Andrey Kolmogorov developed a stunningly simple and powerful definition that captures the information content of a single, specific object. The idea goes back to our intuition:

The information content of an object is the length of its shortest possible description.

But what is a “description”? If we allow a custom “decompressor” for every object, we run into a paradox: we could define a decompressor that takes the input “0” and outputs our object. Then every object could be “compressed” to a single bit! This is useless because the complexity is just hidden in the decompressor.

Kolmogorov’s genius was to realize that a program is the ultimate decompressor.

The Kolmogorov Complexity of a binary string $x$ , denoted $K (x)$ , is the length of the shortest program, in a fixed universal programming language (like Python or a Turing Machine), that generates $x$ as output and then halts.
$K (x) = min {∣ P ∣ ∣ P is a program that outputs x}$

This definition is profound. It elegantly sidesteps the decompressor problem by including the decompressor’s description in the length of the program itself.

An Upper Bound: The complexity of a string can’t be much larger than the string itself. We can always write a trivial program:

print("0110101110...")

The length of this program is the length of the string $x$ plus some constant overhead $d$ for the print command. So, for any $x$ :

K (x) \leq ∣ x ∣ + d

The Power of Computation: Now consider the string $x_{n} = 0^{2^{n}}$ (a sequence of $2^{n}$ zeros).

The trivial program print("00...0") would have length $2^{n} + d$ .

But we can do much better!

n = 5 # or whatever n we need
k = 2**n
print('0' * k)

The length of this program depends on the length of the binary representation of $n$ , which is about $lo g n$ . The massive amount of memory and time needed to compute and store $2^{n}$ zeros does not count towards the program’s length. The Kolmogorov complexity is about the size of the description, not the resources used in execution.

So, $K (0^{2^{n}}) \approx lo g n$ , which is tiny compared to its length $2^{n}$ . This confirms our intuition: regular, patterned objects have low information content. A truly random string, by contrast, has no pattern, and the shortest program to generate it is essentially print("the_string"). Its complexity is approximately its own length.

This gives us the ultimate definition of randomness: a string is random if it is incompressible.

Kolmogorov complexity is the best definition of information we have. It’s a beautiful, powerful theoretical tool. Its one major drawback? It is uncomputable. There is no algorithm that can calculate $K (x)$ for an arbitrary string $x$ . We will prove this astonishing fact later in the course.

Continue here: 04 Kolmogorov Complexity and the Nature of Randomness

CS Notes

Explorer

03 Formalizing Algorithmic Problems and Information