Computers work with text. Programs are text, data is stored as text (sequences of bits), and inputs/outputs are essentially text. This chapter introduces the formal language we need to talk about these “texts” precisely. This formalism is the bedrock upon which we will build all the core concepts of theoretical computer science.

2.1 Aims

This chapter has three main goals:

  1. Introduce the fundamental vocabulary: We will define the core concepts of Alphabet, Word, and Language. These are the basic building blocks for describing information and computation.
  2. Formalize Algorithmic Problems: We will use this new vocabulary to precisely define what an “algorithmic problem” is. We will focus on two crucial classes: Decision Problems and Optimization Problems.
  3. Measure Information Content: We will explore the concept of text compressibility by introducing Kolmogorov Complexity. This powerful idea allows us to measure the information content of a word and provides a formal definition for the elusive concept of randomness.

2.2 Alphabets, Words, and Languages

Just like natural languages, the languages of computation start with symbols.

Alphabets and Words

Definition 2.1: Alphabet

An alphabet is a finite, non-empty set, usually denoted by . The elements of an alphabet are called letters, characters, or symbols.

Examples:

  • The Boolean alphabet:
  • The Latin alphabet:
  • A keyboard alphabet: , where is the space symbol.
  • The alphabet for m-adic numbers: for any .

Definition 2.2: Word

A word (or string) over an alphabet is a finite sequence of letters from .

  • The empty word, denoted by (or ), is the sequence with zero letters.
  • The length of a word , denoted , is the number of letters in the sequence.
  • is the set of all possible words over .
  • is the set of all non-empty words.

For example, 010011 is a word over with length . The set contains all possible binary strings: .

Representing Objects as Words

We can use words to represent virtually any object of interest in computation.

  • Numbers: A binary word can represent the natural number . We denote the standard binary representation of a number as .
  • Sequences of Numbers: A sequence of numbers can be represented as a single word using a separator symbol, e.g., over the alphabet .
  • Graphs: A directed graph with vertices can be represented by its adjacency matrix. By writing out the rows of the matrix separated by a #, we get a single word representing the graph. For example, the graph in Figure 2.1 has the adjacency matrix: This can be encoded as the word 0011#0011#0101#0000#.

Operations on Words and Subwords

Definition 2.3: Concatenation

The concatenation of two words and is the word formed by appending to the end of .

Concatenation is associative, i.e., . The set of all words with the concatenation operation forms a mathematical structure called a monoid, with the empty word as the identity element ().

Definition 2.6: Subwords

Let be a word over .

  • is a substring (Teilwort) of if for some words .
  • is a prefix of if for some word .
  • is a suffix of if for some word .

For example, in abcabcabc, abc is a prefix, cabc is a substring, and bc is a suffix.

Languages

Definition 2.9: Language

A language over an alphabet is any subset of .

This is a very general definition. Any set of strings is a language.

Examples of languages over :

  • (the empty language)
  • (the language containing only the empty word)
  • (all words with an equal number of a’s and b’s)

We can perform operations on languages:

  • Concatenation:
  • Kleene Star: , where . The Kleene star represents repeating words from the language zero or more times.

Distributivity of Language Operations

Concatenation distributes over union, but not over intersection.

  • Union (holds):
  • Intersection (does not hold): In general, . We only have the inclusion . Why? A word in might be formed by concatenating a word from with one from , and also by concatenating a different word from with one from . The two decompositions might not be the same.

2.3 Algorithmic Problems

Using our new vocabulary, we can now formalize what we mean by an “algorithmic problem.” We’ll start with the most fundamental type.

Decision Problems

Definition 2.11: Decision Problem

A decision problem is a pair , where is an alphabet and is a language.
The problem is to determine, for any given word , whether or not is in .

An algorithm solves (or decides) the problem if it halts on every input and outputs “yes” (or 1) if , and “no” (or 0) if .

A language for which such an algorithm exists is called recursive or decidable.

Intuition

Think of a decision problem as a membership test:

  • The alphabet tells us what symbols can appear in a valid input.
  • Each word is just a possible encoding of an input.
  • The language is the set of all yes-instances of the problem.
    So solving the problem = checking whether the given word is in .

Examples:

  • PRIMALITY:

    • Alphabet:
    • Word:
    • Language:
    • Question: “Is 13 prime?” ⇨ check if
  • Hamiltonian Cycle (HK):

    • Alphabet:
    • Word: a binary encoding of a graph, e.g., 011#101#...
    • Language: all encodings of graphs that have a Hamiltonian cycle.
    • Question: “Does this graph have a Hamiltonian cycle?”
  • SAT (Satisfiability):

    • Alphabet:
    • Word: encoded as a string
    • Language: all formulas that are satisfiable.
    • Question: “Is this formula satisfiable?”

Optimization Problems

Many real-world problems are not simple yes/no questions. They involve finding the best solution among many possibilities.

Definition 2.14: Optimization Problem

An optimization problem is formally a 6-tuple , where:

  • are the input and output alphabets.
  • is the language of valid problem instances.
  • maps an instance to a set of feasible solutions.
  • assigns a cost to each feasible solution.
  • is either Minimum or Maximum.

An algorithm solves if for any instance , it returns a feasible solution with optimal cost.

Intuition

For optimization problems, the language just tells us what counts as a valid instance.
The “word” is the encoding of that instance.
The solution set contains all possible valid answers, and the algorithm’s job is to pick the one with the best cost according to the goal.

Example: Traveling Salesman Problem (TSP)

  • Instance Language : All encodings of weighted complete graphs.
  • Word : e.g., a matrix encoding of a graph .
  • Feasible Solutions : All Hamiltonian cycles in .
  • Cost: Total weight of the cycle.
  • Goal: Minimize cost.

Other examples:

  • Shortest Path:

    • Word = encoding of a graph + source + target.
    • = all paths between source and target.
    • Cost = length of path.
    • Goal = Minimum.
  • Knapsack:

    • Word = list of items with weights/values and a capacity.
    • = all subsets of items that fit in the capacity.
    • Cost = total value.
    • Goal = Maximum.

Other Problem Types

  • Function Problems: Compute a function .

    • Example: FACTORING
      • Alphabet:
      • Word:
      • Function:
      • Output is not just yes/no, but the actual factors.
  • Enumeration Problems: List all the words in a language .

    • Example: ALL PRIMES
      • Alphabet:
      • Language: all encodings of prime numbers.
      • Task: output

Intuition

  • Function problems: “don’t just tell me yes/no, give me the actual answer.”
  • Enumeration problems: “list all solutions, not just one.”

2.4 Kolmogorov Complexity

How can we measure the “information content” of a single word? The intuitive idea is that a string is simple (low information) if it has a short description, and complex (high information) if it doesn’t. The string 00...0 ( times) can be described as “n zeros,” while a random-looking string seems to have no description shorter than the string itself.

This leads to Kolmogorov’s brilliant insight: the ultimate description of a string is a program that generates it.

Definition 2.17: Kolmogorov Complexity

The Kolmogorov Complexity of a word , denoted , is the length of the shortest program (encoded as a binary string) in a fixed universal programming language (e.g., Pascal, Python) that generates as its output.

This definition is surprisingly robust.

  • Invariance Theorem: The choice of programming language doesn’t significantly change the complexity. For any two languages A and B, there is a constant such that for any string , . The complexity is an intrinsic property of the string, not the language used to describe it.
  • Incompressible Strings: For any length , there exist strings with such that . A simple counting argument shows this: there are strings of length , but only programs of length less than . There simply aren’t enough short programs to describe every long string.

Applications of Kolmogorov Complexity

  1. Defining Randomness: Kolmogorov complexity gives us a formal way to define what it means for a string to be random.

    A string is random if it is incompressible, i.e., . This captures the idea of “no pattern” or “no simple description.”

  2. Proving Lower Bounds: It can be used to prove that certain problems are hard. For example, one can prove a weaker version of the Prime Number Theorem by arguing that if primes were too sparse, one could create short descriptions for many numbers (via their prime factorization), which would contradict the existence of incompressible numbers.

  3. Connecting Computability and Complexity: If a language is decidable, then the Kolmogorov complexity of its words (ordered canonically) cannot grow arbitrarily. Specifically, for the -th word in , we have . This is because we can write a short program that, given , finds and prints . This provides a powerful tool for proving that certain languages are not decidable.

2.5 Summary and Outlook

This chapter laid the essential groundwork for the rest of our study.

  • We defined alphabets, words, and languages as the formal tools to describe data and problems.
  • We categorized algorithmic tasks into decision, optimization, and other types of problems.
  • We introduced Kolmogorov complexity as a robust measure of the information content of a string, which also provides a formal definition of randomness.

The study of these fundamental concepts is the core of Formal Language Theory, one of the oldest and most foundational areas of computer science. With this language in hand, we are now ready to explore the machines that process it, starting with the simplest model: the finite automaton.

Next Chapter: Chapter 3 - Finite Automata