Lecture from: 15.04.2025 | Video: Homelab

We’ve seen how Bloom filters provide a space-efficient way to check for set membership with a controllable false positive rate. Let’s compare them more directly with hash tables and then move on to another classic algorithm and a very important application of randomized algorithms: primality testing.

Comparison: Hash Table vs. Bloom Filter

Structure and Information Stored

Observation: When a Bloom filter uses only hash function, its structure becomes somewhat similar to a hash table where we only mark if a hash bucket is occupied or not, without storing the actual element or resolving collisions by chaining/probing.

Key Difference (for and beyond):

  • Bloom Filter: Stores only a “yes/no” indication for each bit in its array . If , it means some element hashing to this position (or one of its positions if ) has been seen. It doesn’t store which specific element or how many.
  • Hash Table (for set membership): Typically stores the actual elements (or pointers to them, or their original indices if we are checking for duplicates from a specific input sequence). This allows it to resolve collisions definitively and avoid false positives. When used for finding duplicates, it stores the original index of the element that hashed to a bucket.

Collision Probabilities and Space

Let’s consider the scenario where we want to achieve an expected number of “undesired events.”

Floyd’s Cycle-Finding Algorithm (“Tortoise and Hare”)

This algorithm is used to find a duplicate number in an array where each value is treated as a pointer to the next index, forming a functional graph (linked list structure). The duplicate corresponds to a cycle in this structure.

Check out the video below which explains this in a much better way imo…

Diagram

You can visualize this with the classic rho (ρ) structure:

[Start] → → → [Cycle Start] → → → [Meeting Point]
      \________/          \_______/
         stem (k)              cycle (ℓ)

Assumptions

You are given an array a of length with values in the range . By the pigeonhole principle, there is at least one duplicate.

We define a function:

This defines a sequence:

Eventually, this sequence enters a cycle. Our goal is to find two distinct indices such that:

Step-by-Step Algorithm

Phase 1: Detect Cycle (Find Meeting Point)

Use two pointers:

  • tortoise: moves 1 step at a time: tortoise = a[tortoise]
  • hare: moves 2 steps at a time: hare = a[a[hare]]

Initialize:

tortoise = a[n]
hare = a[a[n]]

Loop until they meet:

while tortoise != hare:
    tortoise = a[tortoise]
    hare = a[a[hare]]

At this point, both pointers are in the cycle and at the same position.

Phase 2: Find Start of Cycle (Duplicate Value)

Reset one pointer (tortoise) to the beginning:

tortoise = n

Move both pointers one step at a time:

while tortoise != hare:
    tortoise = a[tortoise]
    hare = a[hare]

They now meet at the start of the cycle, which corresponds to the duplicate value:

duplicate_value = hare  # or tortoise

Phase 3: Find Indices such that

Find first index i: (predecessor on the path from star)
i = n
while a[i] != duplicate_value:
    i = a[i]
Find second index j: (predecessor inside the cycle)
j = a[duplicate_value]  # move one step into the cycle
while a[j] != duplicate_value:
    j = a[j]

Now you have:

Return the pair , or sort if needed:

return (i,j)

Why It Works

I didn’t like the way it was proven in the lecture, so I’ve attached a more intuitive proof below…

Time and Space Complexity

  • Time: — each pointer traverses at most steps
  • Space: — only a constant number of pointers used

Primality Testing

Problem

Given an integer , determine whether is a prime number (i.e., has no divisors other than 1 and itself).

Application:

Efficiently generating large random prime numbers is critical in cryptography, especially in RSA encryption. For example, RSA often uses primes with around 1000 bits.

The standard approach:

  1. Generate a random large number .
  2. Test if is prime.
  3. If it is, use it. Otherwise, repeat.

By the Prime Number Theorem, the density of primes around a large number is approximately:

For a 1000-bit number , we estimate . So, roughly 1 in every 693 such numbers is prime. Thus, the primality test must be very efficient, as we’ll need to test hundreds of candidates on average.

A Brief History of Primality Testing

  • ~200 BC (Eratosthenes): Sieve of Eratosthenes: Finds all primes . Efficient for small , but not polynomial in (i.e., the input size in bits).

  • 1976 (Miller, Rabin): Introduced a Monte Carlo primality test with one-sided error:

    • If is prime, it always says “prime”.
    • If is composite, it says “composite” with high probability (≥ 3/4). (False positives can occur, but their probability can be made arbitrarily small through repetition.)
  • 1977 (Solovay, Strassen): Another Monte Carlo test, based on quadratic residues.

  • 1987 (Adleman, Huang): A different probabilistic approach that never falsely accepts composites, but might wrongly reject a prime.

  • 2002 (Agrawal, Kayal, Saxena - Ask): Developed the first deterministic, polynomial-time primality test. A major theoretical milestone, but still too slow in practice for cryptographic sizes compared to randomized methods like Miller-Rabin.

Basic Primality Tests (Flawed or Inefficient)

1. Naive Trial Division:

Check if any divides .

  • Problem: Too slow - takes divisions.
  • For , this is computationally infeasible.
  • We need algorithms that run in time polynomial in .

2. Randomized Trial Division:

  1. Choose uniformly at random.
  2. If , return “composite”; else return “prime”.
  • Problem: If is composite but has only large prime factors, the test is unlikely to find a divisor, leading to high error probability.

Euclid-Based Primality Test

Key Insight:

If is prime, then for all ,

These ‘s form the multiplicative group , which has elements.

  • If is prime, then .

EUCLID-PRIMZAHLTEST Algorithm:

  1. Pick uniformly at random.
  2. Compute .
  3. If : return “composite”.
  4. If : return “prime”.

Analysis:

  • If is prime: for all . So the test always correctly returns “prime” - no false negatives.

  • If is composite:

    • The test returns “prime” (a false positive) if is coprime to .

    • The error probability is:

Example: RSA-type numbers

Let , where , are large distinct primes:

  • Then .

  • So,

    which is very close to 1. That means most random ’s will not reveal that is composite.

Example:
  • Then ,
  • So error rate ≈ , again close to 1.

Conclusion: This test is not reliable for composite numbers - especially those used in cryptography.

We need tests with guaranteed low error probabilities (e.g., ≤ 1/4), regardless of ’s factorization.

Looking Ahead

In the next lecture, we’ll explore probabilistic primality tests like Miller-Rabin, which are far more robust. These tests are based on Fermat’s Little Theorem and deeper properties of modular arithmetic, offering significantly lower error rates for composite numbers - even those used in cryptography.

Continue here: 18 Fermat and Miller-Rabin Primality Tests