Lecture from: 06.12.2024 | Video: Videos ETHZ

The Java Collections Framework

Interface Set

No Duplicates in Sets

The add() method in a Set returns true if the element was successfully added, and false if the element was already present.

Example:

Set<String> unis = new TreeSet<>();
System.out.println(unis.add("ETH Zurich")); // true
System.out.println(unis.add("ETH Zurich")); // false (already present)
System.out.println(unis.add("ETH Lausanne")); // true
// ...
System.out.println(unis); //[EPFL, ETH Lausanne, ETH Zurich, MIT]

Note: The order of elements in the output might vary for HashSet. It is guaranteed to be in natural order for TreeSet though.

Duplicates: Warning 1 and 2

  • Warning 1: If you don’t override equals() for custom objects in a HashSet, duplicates might be added because the default equals() compares references, not content.

  • Warning 2: If objects in a HashSet are modified after being added, duplicates might be introduced without triggering any errors. HashSet does not re-check for duplicates after modification. For this reason objects should not be changed while contained in a HashSet! Otherwise first remove that object, then change it and then add it to the set again.

Example: Priority Queue

A priority queue is a data structure that stores elements with associated priorities. The element with the highest priority is always retrieved first.

Implementation Choices:

  • Using unsorted lists is inefficient because we would have to iterate through the list each time to extract the element with minimum priority.
  • Sorted lists have fast retrieval of the smallest element, however, insertion into a sorted list is slow.
  • HashSet: Finding the minimum element is not directly supported.
  • TreeSet: A TreeSet maintains a sorted order efficiently, so finding and removing the smallest element is fast.

Java already provides a PriorityQueue implementation, but this example serves as an illustration of how you could create one if necessary.

Using a TreeSet to implement a priority queue is a good choice, but usually it’s more efficient to use a Heap.

Example using TreeSet:

class Task implements Comparable<Task> {
    String name;
    int priority;
 
    public Task(String n, int p) {
        name = n;
        priority = p;
    }
 
    @Override
    public String toString() {
        return name + " (" + priority + ")";
    }
 
    @Override
    public int compareTo(Task t) {
        return t.priority - priority; // Higher priority comes first
    }
 
	// ... needs equals override too ...
}
 
 
TreeSet<Task> todo = new TreeSet<>();
todo.add(new Task("Sport", 2));
todo.add(new Task("Sleep", 6));
todo.add(new Task("EProg", 4));
todo.add(new Task("Sport", 5));
//...
while (!todo.isEmpty()) {
    Task t = todo.first(); //Takes O(log n)
    todo.remove(t); //Takes O(log n)
    System.out.println(t);
}
//Output:
//Sleep (6)
//Sport (5)
//EProg (4)
//Sport (2)

In this example, Task objects are comparable based on their priorities. The TreeSet automatically stores elements in order so retrieval is simple.

Iteration with the For-Each Loop

Iterating over collections and arrays is simplified by the for-each loop (enhanced for loop). It provides a concise way to access each element without explicit indexing.

Syntax:

for (ElementType element : collectionOrArray) {
    // Do something with element
}

Example:

static void printAll(Collection<String> c) {
    for (String element : c) {
        System.out.print(element + " ");
    }
    System.out.println();
}
 
Set<String> s = new HashSet<>(Arrays.asList("Zoe", "Anna", "Vera"));
printAll(s); // Might print: Zoe Vera Anna (order not guaranteed for HashSet)
 
Set<String> ss = new TreeSet<>(Arrays.asList("Zoe", "Anna", "Vera"));
printAll(ss); // Prints: Anna Vera Zoe (natural ordering for TreeSet)
 
List<String> l = new LinkedList<>(Arrays.asList("Zoe", "Anna", "Vera"));
printAll(l); // Prints: Zoe Anna Vera (insertion order for LinkedList)

The for-each loop works with any object that implements the Iterable interface (which Collection extends).

Iteration Order

The order of iteration depends on the collection type:

  • ArrayList, LinkedList, Array: Elements are iterated in their index order (from 0 to size - 1).

  • TreeSet: Elements are iterated in their natural sorted order (according to compareTo()).

  • HashSet: Iteration order is not defined and depends on the implementation of the hashing algorithm.

  • LinkedHashSet: Elements are iterated in the order they were inserted (not covered in detail in the lecture).

Map Interface

The Map interface represents an associative data structure that stores key-value pairs. Each key must be unique (keys effectively form a Set), but multiple keys can map to the same value. Map implementations provide efficient methods for retrieving values based on their associated keys.

More info here: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/Map.html

Map Characteristics

  • Associative: Stores key-value pairs (like a dictionary).
  • No Key Duplicates: Each key can appear only once. Values can be duplicated.
  • No Ordering or Indexing: Elements are not ordered, not indexed. (Note: TreeMap provides a sorted ordering based on keys.)

Map Interface Methods

Key methods of the Map<K, V> interface include:

  • put(K key, V value): Associates the specified value with the specified key in this map. If the map previously contained a mapping for the key, the old value is replaced.
  • replace(K key, V value): Replaces the entry for the specified key only if it is currently mapped to some value.
  • remove(Object key): Removes the mapping for a key from this map if it is present.
  • get(Object key): Returns the value to which the specified key is mapped, or null if this map contains no mapping for the key.
  • getOrDefault(Object key, V defaultValue): Returns the value to which the specified key is mapped, or defaultValue if this map contains no mapping for the key.
  • containsKey(Object key): Returns true if this map contains a mapping for the specified key.
  • containsValue(Object value): Returns true if this map maps one or more keys to the specified value.
  • keySet(): Returns a Set view of the keys contained in this map.
  • values(): Returns a Collection view of the values contained in this map.
  • isEmpty(): Returns true if this map contains no key-value mappings.
  • size(): Returns the number of key-value mappings in this map.
public interface Map<K, V> { ... } // K for key type, V for value type

Example: Map Usage

Map<String, Integer> numberOfLetters = new HashMap<>(); // Or TreeMap, etc.
numberOfLetters.put("if", 2);
numberOfLetters.put("in", 2); // Duplicate values are allowed
numberOfLetters.put("byte", 4);
numberOfLetters.put("why", 3);
 
System.out.println(numberOfLetters.get("why"));    // 3
System.out.println(numberOfLetters.get("whyy"));   // null (no such key)
System.out.println(numberOfLetters.getOrDefault("why", -1)); // 3
System.out.println(numberOfLetters.getOrDefault("whyy", -1));// -1 (default value)
System.out.println(numberOfLetters.containsKey("why"));  // true
System.out.println(numberOfLetters.containsValue(2)); // true

getOrDefault()

The getOrDefault() method is a convenient way to retrieve a value from a Map, providing a default value if the key is not found. This avoids explicit null checks.

int nr = numberOfLetters.getOrDefault("whyy", -1); // nr will be -1 if "whyy" is not a key

Overwriting Values in a Map

If you put() a key-value pair into a Map where the key already exists, the old value is overwritten with the new value. No explicit removal is necessary. Modifying the value associated with a key is permitted. However, modifying the key itself may lead to inconsistencies and should be avoided, as it would break the internal structure of the map.

numberOfLetters.put("why", 4);  // Overwrites the previous value (3)
numberOfLetters.put("why", 3);  // Overwrites again

Map Implementations

The Java Collections Framework provides various Map implementations:

  • HashMap: Uses a hash table for efficient lookup (average for most operations).

  • TreeMap: Stores entries in sorted order based on keys’ natural ordering or a supplied comparator ( for most operations). Keys must be Comparable or you must provide a Comparator.

  • LinkedHashMap: Maintains insertion order (not covered in detail here).

The choice of map implementation depends on the use case. HashMap is the most common because it is quite efficient for lookups, while if operations like finding the smallest/largest key or range queries are required, a TreeMap may be a good choice. TreeMap can store elements in a sorted manner according to the keys’ natural ordering (provided by Comparable) or based on a Comparator object.

Example: TreeMap Usage

TreeMap is a good choice when maintaining a sorted order over the keys is desired.

TreeMap: Order-Based Operations

Similar to TreeSet, TreeMap provides methods for order-based operations:

  • descendingMap(), descendingKeySet(): Return views of the map/key set in reverse order.
  • firstEntry(), firstKey(), lastEntry(), lastKey(): Return the first/last entry/key in the sorted order.
  • lowerKey(K key), higherEntry(Map.Entry<K,V> e) : Return the largest key strictly less than key or the first entry strictly greater than the entry e.
  • floorKey(K key), ceilingEntry(Map.Entry<K,V> e): Return the largest key less than or equal to key, or the first entry greater than or equal to the entry e.
  • headMap(K toKey), tailMap(K fromKey): Return a submap containing entries strictly before/after toKey/fromKey in the sorted order. There exist similar methods to define inclusive ranges. .

Map Types

Keys and values in a Map must be reference types. You can use wrapper classes for primitive types. The types K and V can be interfaces, abstract classes, or concrete classes, making Map a very versatile structure.

Multiple Values per Key

If you need to store multiple values per key, you can use an auxiliary Record class (or other composite value type) to encapsulate the multiple values. Map only allows exactly one value per key. In those cases where multiple values per key are needed, you have to store them in some data structure as value or store multiple copies with the same key. However, the Collections framework offers a better solution using the Record class.

Example:

class Record {
    String name;
    int grade;
 
    public Record(String name, int grade) {
        this.name = name;
        this.grade = grade;
    }
//Override toString for nice output if needed
}
 
 
Map<String, Record> students = new HashMap<>();
students.put("11919040", new Record("Manuela", 1));
students.put("09123434", new Record("Malte", 6));
 
System.out.println(students); // {11919040=... , 09123434=...}
//With overridden toString() in Record
//Output:
//{11919040=Manuela:1, 09123434=Malte:7}

The Record class stores both the name and grade. The Map then uses the student ID as the key and the Record object as the value.

keySet() and values()

  • keySet(): Returns a Set containing all the keys in the Map. You should assign the result to a Set<K> or its supertype but not to a specific Set implementation directly. You cannot directly modify a map by modifying its keySet or values view. To alter the original map, use direct operations on the map itself.

  • values(): Returns a Collection containing all the values in the Map. Note: Values are not guaranteed to be unique. Again, you should assign it to a Collection<V> or a more general type but not to a concrete collection. It is not allowed to modify the values in the map by changing elements in values().

Example using keySet and values():

Map<String, String> myMap = new TreeMap<>();
myMap.put("zero", "null");
myMap.put("one", "eins");
myMap.put("two", "zwei");
myMap.put("four", "vier");
myMap.put("eight", "acht");
 
 
void mumble(Map<String, String> map) {
    Map<String, String> result = new TreeMap<>();
    for (String key : map.keySet()) {
        if (key.compareTo(map.get(key)) < 0) {
            result.put(key, map.get(key));
        } else {
            result.put(map.get(key), key);
        }
    }
    System.out.println(result);
}
 
mumble(myMap);
//{acht=eight, eins=one, null=zero, vier=four, zwei=two}

The mumble() function iterates through the keys of a given map.

Example on key/value asymetry:

Notice how we can map the German word to a English translation. The keys (German words) must be unique, but the values (English translations) can be duplicated.

Now lets try to make a new Map but have the English words as keys instead.

Here we stumble into the problem that since we had originally duplicate values, we know are overwriting the previous duplicate instances of our English word, hence we are also getting a smaller mapping.

Inverse Mapping

Creating an inverse mapping (swapping keys and values) can be tricky if values are not unique.

Example (Illustrative Problem):

Map<String, Double> map = new HashMap<>();
map.put("Ben", 5.0);
map.put("Ann", 4.0);
map.put("Zoe", 6.0);
map.put("Joe", 3.5);
 
Map<Double, String> mapInv = new HashMap<>(); 
// Using a HashMap can lead to data loss in the inverse mapping
// as each key maps to only one value.
 
for (String key : map.keySet()) {
    mapInv.put(map.get(key), key);
}
 
//Output of mapInv. If toString() was properly implemented
//{4.0=Ann, 5.0=Ben, 3.5=Joe, 6.0=Zoe}
 
System.out.println(mapInv);

If Ben and Ann both have a 5.0 grade, data is lost (Ann might be omitted).

Example (Solution):

By mapping values to sets of keys, you avoid data loss.

Inverse Mapping and Range Queries

If you need to perform range queries on the values (e.g., “find all students with grades greater than or equal to 4.0”), a TreeMap for the inverse map is a much better choice than HashMap:

HashMap:

TreeMap:

Using TreeMap, values are automatically sorted so tailMap is very efficient. You still need to loop over the values using for-each to be able to print all of them.

Hash vs Tree for Inverse Mappings:

  • HashMap: No ordering of values, range queries are not efficient.
  • TreeMap: Values are sorted according to their natural ordering (or a comparator), enabling efficient range queries.

Comparator Interface

The Comparator<E> interface provides a way to define an ordering for objects outside of the class definition (separate class). This is useful when:

  • You cannot modify the class to implement Comparable.
  • You need different orderings for different contexts.

Using a Comparator is useful as it allows us to define an ordering relation without modifying the corresponding class, so it can be done for even those classes we don’t control (e.g. String).

Interface:

public interface Comparator<E> {
  int compare(E o1, E o2);
}

A Comparator object defines a specific order for a given type using the compare method.

Example: Reverse Alphabetical Order

This Comparator can then be used with Collections.sort() or the constructors of sorted collections like TreeSet and TreeMap.

Continue here: 23 Java Collections Framework, Comparator Interface, Iterator Interface, Generics, ArrayList Example