Lecture from: 03.12.2024 | Video: Videos ETHZ

The Java Collections Framework

Interface List

Iterating and Modifying Collections: Caution!

Let us continue with the previous example of intersect.

Example Intersection

For variant 1 see previous lecture note…

Variant 2 (Modifying a copy)

Notice the crucial i-- after remove() to adjust the index since the list size changes.

Variant 3 (Using retainAll)

The retainAll() method elegantly performs the intersection operation.

Searching in Lists

  • LinkedList: Searching using contains(Object) is inefficient (linear time complexity, ).

    List<Integer> l = new LinkedList<>(Arrays.asList(9, 2, 3, 4, 2, 1));
    System.out.println(l.contains(1)); // true
  • ArrayList: Searching with contains(Object) or indexOf(Object) is also inefficient (linear time, ), but if the list is sorted a more efficient search is possible (binary search).

    List<Integer> a = new ArrayList<>(Arrays.asList(9, 2, 3, 4, 2, 1));
    System.out.println(a.contains(1)); // true
    System.out.println(a.indexOf(1)); // 5
  • Sorted ArrayList: If an ArrayList is sorted, you can use Collections.binarySearch() which significantly improves search performance (logarithmic time, ).

    List<Integer> a = new ArrayList<>(Arrays.asList(9, 2, 3, 4, 2, 1));
    Collections.sort(a);  // Sort the ArrayList first!
    int index = Collections.binarySearch(a, 9); // Returns the index
    if (index >= 0) { // Element found
      System.out.println("Found at index: " + index);
    }else{
      System.out.println("Not found.");
    }

Important: Binary search only works correctly on sorted lists. Using binary search on an unsorted list will produce unpredictable results. Binary search although allowed with some extra steps, is not efficient for a LinkedList due to its non-random access characteristics.

Comparing Elements: compareTo and Comparable

Comparing elements is essential for sorting and other operations. Java provides mechanisms for establishing a natural ordering for objects.

Equality vs. Ordering

  • Equality (==, equals()): Used to determine if two objects are equal in value. The == operator compares references, and equals() should be overridden to compare object content.

  • Natural Ordering (<, >, compareTo()): Used to establish an ordering between objects (less than, greater than, or equal). The < and > operators are not applicable to reference types, so Java uses the compareTo() method.

The Comparable Interface

The Comparable<E> interface defines a single method, compareTo(E other), which is used to compare the current object to another object of the same type. Note that the implementation of compareTo should be consistent with the implementation of equals() in your custom classes, in particular the sign of compareTo() should match the boolean return value of equals(). That is: if e1.compareTo(e2) == 0, then we want e1.equals(e2) == true and vice versa. This consistency ensures that collections can function predictably when using compareTo and equals.

class E implements Comparable<E>{
  public int compareTo(E other) { ... }
    // Returns:
    //   < 0 if this object is less than other
    //   > 0 if this object is greater than other
    //   = 0 if this object is equal to other
}

compareTo() in the String Class

The String class implements Comparable and provides a natural ordering based on lexicographical (alphabetical) order.

Implementing compareTo() for Custom Classes

For your own classes, you need to implement the Comparable interface and provide a compareTo() method if you want to be able to compare and sort instances of those classes.

Example:

This example shows how to implement compareTo() for a Person class by delegating the comparison to the name attribute (a String).

Sorting with java.util.Collections

The Collections.sort() method can be used to sort List objects that contain elements of a Comparable type.

The Collections.sort() method uses the compareTo() method defined by each object in the list to determine the correct order and modifies the original list.

The Comparable Interface for Sorting Custom Objects

To sort a list of custom objects using Collections.sort(), your class must implement the Comparable interface.

Example:

class Person {
    String name;
    int age;
    
    public int compareTo(Person other) {
        return name.compareTo(other.name);
    }
}
 
ArrayList<Person> a = new ArrayList<>();
a.add(new Person("Ben", 19));
a.add(new Person("Zoe", 25));
// ... add more
Collections.sort(a);
// Does NOT work because Person implements Comparable

Since Person did not implement Comparable, the Collections.sort() call would result in a compile-time error. Comparable enforces a consistent comparison method used for sorting.

public class Person implements Comparable<Person> {
	// ...
	public int compareTo(Person other) { /*...*/ }
}

Another Example: Sortable 2D Points

public class Point implements Comparable<Point> {
    private int x;
    private int y;
 
    public Point(int x, int y) {
        this.x = x;
        this.y = y;
    }
 
    @Override
    public int compareTo(Point other) { 
	    //Define ordering for point
        if (x < other.x || (x == other.x && y < other.y)) {
            return -1;
        }
        if (x == other.x && y == other.y) {
            return 0;
        }
        return 1;
    }
}

The compareTo method defines an ordering relation on two given points using their x and y coordinates.

Consistency of equals() and compareTo()

For any class, the compareTo() method should be consistent with the equals() method.

  • Consistency Rule: If e1.compareTo(e2) == 0, then e1.equals(e2) should be true, and vice-versa. This consistency is essential to maintain predictable behavior for data structures like TreeSet, where comparisons with compareTo and equals are involved.

  • Exception for null: e1.compareTo(null) should throw a NullPointerException, but e1.equals(null) must always be false.

Summary: Sorting and Comparable

To use Collections.sort() on a list of custom objects, your class must implement Comparable and define a compareTo() method consistent with equals(). Typical compareTo implementations often involve delegation (e.g., comparing based on a String attribute) or defining comparisons based on multiple attributes.

Interface Set

The Set interface represents a non-associative collection that does not allow duplicate elements. It extends the Collection interface. Set implementations use the equals() method to determine element equality. The add() method returns false if you try to add an element that is already present in the set (according to the equals() method). Sets are typically unordered, meaning that there’s no guarantee about the sequence in which elements are retrieved during iteration. Note: LinkedHashSet offers a predictable iteration order, but this is not the focus of this part of the lecture.

public interface Set<E> extends Collection<E> {
    // ... methods inherited from Collection ...
 
    //add(), remove(), contains(), isEmpty()
}

Typical Set Operations

Common set operations include intersection, union, and difference:

  • Intersection: a.retainAll(b) modifies set a to contain only the elements that are also present in set b.

  • Union: a.addAll(b) adds all elements of set b to set a. Note: since b is a set it does not contain duplicate elements.

  • Difference: a.removeAll(b) removes all elements of set b from set a.

Set Implementations

The Java Collections Framework provides several Set implementations:

  • HashSet: Uses a hash table for efficient element lookup (on average for add(), remove(), and contains()).

  • TreeSet: Stores elements in a sorted order using a balanced binary search tree ( time complexity for most operations).

  • LinkedHashSet: Maintains insertion order during iteration, but we will not cover this now.

TreeSet: Idea and Example

TreeSet elements are stored in a balanced binary search tree, which ensures efficient insertion, deletion, and search (). TreeSet requires an ordering relation between elements. This means the element type must implement the Comparable interface or you must provide a Comparator (more on this later). TreeSet is efficient for ordered set operations.

Example:

int[] a = {4, 5, 16, 8, 6, 1, 6, 1, 3};
Set<Integer> set = new TreeSet<>();
for (int i = 0; i < a.length; i++) {
    set.add(a[i]);
}
System.out.println(set); // Output: [1, 3, 4, 5, 6, 8, 16] (natural ordering of Integer)

Note that a TreeSet will store its elements in ascending order based on their natural ordering by default. The natural ordering is given by the objects’ compareTo() methods. In this specific example, the integers would be output in their ascending numerical order.

TreeSet: Importance of compareTo() and Runtime Exception

If you use TreeSet with a custom class that doesn’t implement Comparable, you’ll get a ClassCastException at runtime, not a compile-time error. This can be tricky to debug.

class Point {
    int x, y;
    // ... constructor
    // equals() is implemented but compareTo() is missing!
}
 
Set<Point> points = new TreeSet<>(); // No compile-time error
points.add(new Point(5, 4));       // Runtime ClassCastException!

Recommendation: Always implement both equals() and compareTo() (and hashCode(), covered later) when designing classes that might be stored in collections like TreeSet.

TreeSet: Order-Based Operations

TreeSet provides methods for order-based operations, exploiting its sorted nature:

  • descendingSet(): Returns a view of the set in reverse order.
  • first(), last(): Return the smallest and largest element, respectively.
  • lower(E e), higher(E e): Return the largest element strictly less than e, and the smallest element strictly greater than e, respectively.
  • floor(E e), ceiling(E e): Return the largest element less than or equal to e, and the smallest element greater than or equal to e, respectively.
  • headSet(E e), tailSet(E e): Return a subset containing elements strictly less than e, and greater than or equal to e, respectively. There are also versions of these methods which include the element e in the subset.

More Info here: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/TreeSet.html

HashSet: Idea and Example

HashSet uses a hash table for efficient element storage and retrieval. Elements do not maintain a specific order. HashSet requires a properly implemented hashCode() method (discussed below) rather than compareTo(), but it’s recommended to implement a proper equals as well for better control of duplicate handling.

Example:

int[] a = {4, 5, 55, 66, 111, 1, 166};
Set<Integer> set = new HashSet<>();
for (int i = 0; i < a.length; i++) {
    set.add(a[i]);
}
System.out.println(set); // Output: Order is not guaranteed (might be [1, 66, 5, 55, 4, 111, 166] or different)
Hashing and hashCode()

Hashing is a technique used by HashSet and HashMap to efficiently store and retrieve elements. A hash function maps an object to an integer value (the hash code). Hash codes are used to determine the “bucket” in the hash table where an element should be stored. This allows for very fast average case lookups (). The hash code must not change over time unless the object is specifically mutated.

  • The hashCode() Method: Defined in the Object class, hashCode() returns an int representing the object’s hash code. All Java classes automatically inherit a default implementation of hashCode() from Object.

  • Overriding hashCode(): For custom classes used in hash-based collections, you should override hashCode() so that:

    1. Equal objects (according to equals()) have the same hash code.
    2. Unequal objects have different hash codes. This minimizes collisions and improves the distribution of elements in the hash table. (But not strictly required)
  • Ideal hashCode() Implementation: Ideally, a hash function maps objects to unique integers. This minimizes collisions and maximizes performance. However, in practice, collisions may still happen, and hash tables need to deal with collision resolution. For hash set this occurs by storing multiple elements which have the same hash value in a list within the corresponding hash bucket.

  • Implementing hashCode(): The default implementation of hashCode() is fine as long as you are comparing objects via their memory addresses. If however, you have overridden the equals() a method for meaningful equality of your custom objects, you absolutely have to also implement a proper hashCode() method for hash based sets to work correctly!

  • Overriding hashCode() and equals(): If you override either hashCode() or equals(), you should always override the other as well to ensure that these two methods remain consistent.

Example Implementation:

HashSet Warning: A non-trivial hash function is crucial for the efficiency of a HashSet. HashSet is not suitable for queries that rely on order (e.g. finding the minimum element of a set). Such operations are not possible (or very inefficient) as HashSet elements are not sorted or ordered. Use a TreeSet for order-related operations or operations where uniqueness is important. For just insertion, removal and checks whether some element is contained, a hash set is a good choice.

Continue here: 22 Java Collection Framework, Interface Set, Interface Map, Comparator Interface