Lecture from: 15.04.2025 | Video: Homelab
We’ve seen how Bloom filters provide a space-efficient way to check for set membership with a controllable false positive rate. Let’s compare them more directly with hash tables and then move on to another classic algorithm and a very important application of randomized algorithms: primality testing.
Comparison: Hash Table vs. Bloom Filter
Structure and Information Stored
Observation: When a Bloom filter uses only hash function, its structure becomes somewhat similar to a hash table where we only mark if a hash bucket is occupied or not, without storing the actual element or resolving collisions by chaining/probing.
Key Difference (for and beyond):
- Bloom Filter: Stores only a “yes/no” indication for each bit in its array . If , it means some element hashing to this position (or one of its positions if ) has been seen. It doesn’t store which specific element or how many.
- Hash Table (for set membership): Typically stores the actual elements (or pointers to them, or their original indices if we are checking for duplicates from a specific input sequence). This allows it to resolve collisions definitively and avoid false positives. When used for finding duplicates, it stores the original index of the element that hashed to a bucket.
Collision Probabilities and Space
Let’s consider the scenario where we want to achieve an expected number of “undesired events.”
Floyd’s Cycle-Finding Algorithm (“Tortoise and Hare”)
This algorithm is used to find a duplicate number in an array where each value is treated as a pointer to the next index, forming a functional graph (linked list structure). The duplicate corresponds to a cycle in this structure.
Check out the video below which explains this in a much better way imo…
Diagram
You can visualize this with the classic rho (ρ) structure:
[Start] → → → [Cycle Start] → → → [Meeting Point]
\________/ \_______/
stem (k) cycle (ℓ)
Assumptions
You are given an array a
of length with values in the range . By the pigeonhole principle, there is at least one duplicate.
We define a function:
This defines a sequence:
Eventually, this sequence enters a cycle. Our goal is to find two distinct indices such that:
Step-by-Step Algorithm
Phase 1: Detect Cycle (Find Meeting Point)
Use two pointers:
tortoise
: moves 1 step at a time:tortoise = a[tortoise]
hare
: moves 2 steps at a time:hare = a[a[hare]]
Initialize:
tortoise = a[n]
hare = a[a[n]]
Loop until they meet:
while tortoise != hare:
tortoise = a[tortoise]
hare = a[a[hare]]
At this point, both pointers are in the cycle and at the same position.
Phase 2: Find Start of Cycle (Duplicate Value)
Reset one pointer (tortoise
) to the beginning:
tortoise = n
Move both pointers one step at a time:
while tortoise != hare:
tortoise = a[tortoise]
hare = a[hare]
They now meet at the start of the cycle, which corresponds to the duplicate value:
duplicate_value = hare # or tortoise
Phase 3: Find Indices such that
Find first index i
: (predecessor on the path from star)
i = n
while a[i] != duplicate_value:
i = a[i]
Find second index j
: (predecessor inside the cycle)
j = a[duplicate_value] # move one step into the cycle
while a[j] != duplicate_value:
j = a[j]
Now you have:
Return the pair , or sort if needed:
return (i,j)
Why It Works
I didn’t like the way it was proven in the lecture, so I’ve attached a more intuitive proof below…
Time and Space Complexity
- Time: — each pointer traverses at most steps
- Space: — only a constant number of pointers used
Primality Testing
Problem
Given an integer , determine whether is a prime number (i.e., has no divisors other than 1 and itself).
Application:
Efficiently generating large random prime numbers is critical in cryptography, especially in RSA encryption. For example, RSA often uses primes with around 1000 bits.
The standard approach:
- Generate a random large number .
- Test if is prime.
- If it is, use it. Otherwise, repeat.
By the Prime Number Theorem, the density of primes around a large number is approximately:
For a 1000-bit number , we estimate . So, roughly 1 in every 693 such numbers is prime. Thus, the primality test must be very efficient, as we’ll need to test hundreds of candidates on average.
A Brief History of Primality Testing
-
~200 BC (Eratosthenes): Sieve of Eratosthenes: Finds all primes . Efficient for small , but not polynomial in (i.e., the input size in bits).
-
1976 (Miller, Rabin): Introduced a Monte Carlo primality test with one-sided error:
- If is prime, it always says “prime”.
- If is composite, it says “composite” with high probability (≥ 3/4). (False positives can occur, but their probability can be made arbitrarily small through repetition.)
-
1977 (Solovay, Strassen): Another Monte Carlo test, based on quadratic residues.
-
1987 (Adleman, Huang): A different probabilistic approach that never falsely accepts composites, but might wrongly reject a prime.
-
2002 (Agrawal, Kayal, Saxena - Ask): Developed the first deterministic, polynomial-time primality test. A major theoretical milestone, but still too slow in practice for cryptographic sizes compared to randomized methods like Miller-Rabin.
Basic Primality Tests (Flawed or Inefficient)
1. Naive Trial Division:
Check if any divides .
- Problem: Too slow - takes divisions.
- For , this is computationally infeasible.
- We need algorithms that run in time polynomial in .
2. Randomized Trial Division:
- Choose uniformly at random.
- If , return “composite”; else return “prime”.
- Problem: If is composite but has only large prime factors, the test is unlikely to find a divisor, leading to high error probability.
Euclid-Based Primality Test
Key Insight:
If is prime, then for all ,
These ‘s form the multiplicative group , which has elements.
- If is prime, then .
EUCLID-PRIMZAHLTEST Algorithm:
- Pick uniformly at random.
- Compute .
- If : return “composite”.
- If : return “prime”.
Analysis:
-
If is prime: for all . So the test always correctly returns “prime” - no false negatives.
-
If is composite:
-
The test returns “prime” (a false positive) if is coprime to .
-
The error probability is:
-
Example: RSA-type numbers
Let , where , are large distinct primes:
-
Then .
-
So,
which is very close to 1. That means most random ’s will not reveal that is composite.
Example:
- Then ,
- So error rate ≈ , again close to 1.
Conclusion: This test is not reliable for composite numbers - especially those used in cryptography.
We need tests with guaranteed low error probabilities (e.g., ≤ 1/4), regardless of ’s factorization.
Looking Ahead
In the next lecture, we’ll explore probabilistic primality tests like Miller-Rabin, which are far more robust. These tests are based on Fermat’s Little Theorem and deeper properties of modular arithmetic, offering significantly lower error rates for composite numbers - even those used in cryptography.
Continue here: 18 Fermat and Miller-Rabin Primality Tests