Lecture from: 10.12.2024 | Video: Videos ETHZ

The Java Collections Framework

Comparator Interface

Example: Reverse Alphabetical Order

Continuation…

a. Sorting with a Comparator:

ArrayList<String> names = new ArrayList<>(Arrays.asList("Zoe", "Anna", "Vera"));
Collections.sort(names, new StringReverseComparator()); // Pass the comparator to sort
System.out.println(names); // Output: [Zoe, Vera, Anna] (reverse alphabetical)

b. TreeSet/TreeMap with a Comparator:

This shows how to use a Comparator with TreeSet and TreeMap at the time of creation to define the desired ordering.

Sorting by a Different Criterion (Example: Grades)

Suppose you want to sort students by grade and then by name if grades are the same:

 
class Student {
    String name;
    double grade;
    // Constructor
	
	//...
}
 
class StudentGradeComparator implements Comparator<Student> {
    @Override
    public int compare(Student s1, Student s2) {
        if (s1.grade < s2.grade) return -1; // Compare by grade first
        if (s1.grade > s2.grade) return 1;
        return s1.name.compareTo(s2.name); // Then compare by name if grades are equal
    }
}
 
 
//...then either...
ArrayList<Student> students = new ArrayList<>();
//...add student records
Collections.sort(students, new StudentGradeComparator()); // Sort by grade
 
//...or...
TreeSet<Student> students1 = new TreeSet<>(new StudentGradeComparator());
//...add student records

This code sorts Student objects by grade, using name to break ties. We can sort the list of students or store the students in a set using a Comparator for sorting according to grades.

Anonymous Classes for Comparator (Concise Syntax)

You can use anonymous classes to create a Comparator inline, directly as an argument, without defining a separate named class:

TreeSet<Student> students = new TreeSet<>(new Comparator<Student>() {
    @Override
    public int compare(Student s1, Student s2) {
        return Double.compare(s1.grade, s2.grade); // Compare by grades using Double.compare
    }
});
 

This simplifies the code, especially for one-time-use comparators.

compareTo(E) vs. compare(E, E)

Here is a summary comparing compareTo() and compare():

  • compareTo(E other): Defined in the Comparable interface. Implicitly called on the object you’re comparing (e.g. this). Takes one argument (the other object). Used to define the natural ordering of objects.

  • compare(E o1, E o2): Defined in the Comparator interface. Takes two explicit arguments (the objects being compared). Use this for customized sorting or when you can’t modify the class to implement Comparable.

Example with Comparator (Output Prediction)

class Student {
    String name;
    int grade;
 
    public Student(String name, int grade) {
        this.name = name;
        this.grade = grade;
    }
	//override toString() for output
}
 
class StudentComparatorLength implements Comparator<Student> {
  @Override public int compare(Student s1, Student s2){
      return s1.name.length() - s2.name.length();
  }
}
 
Comparator<Student> htc = new StudentComparatorLength();
Set<Student> scores = new TreeSet<>(htc);
scores.add(new Student("Kim", 38));
scores.add(new Student("Lisa", 94));
scores.add(new Student("Roy", 87));
scores.add(new Student("Marty", 43));
scores.add(new Student("Marisa", 72));
 
System.out.println(scores);
 
// Unexpected output: [Kim, Lisa, Marty, Marisa]

The reason we don’t print out “Roy” is because according to the StudentComparatorLength any two students with same name length are the same. We are inserting into a TreeSet which doesn’t add duplicates.

Now you might ask why does “Kim” not get overwritten by “Kim”. The reason is because in Sets (like TreeSet) the first instance stays but when you .put(k,v) then you always overwrite the previous entry.

Iteration II: Iterator Interface

The Iterator<E> interface allows traversing the elements of a Collection. Unlike the for-each loop, Iterator gives you explicit control over the iteration process and allows you to remove elements during iteration.

Iterator Interface: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/Iterator.html

Why sometimes explicit control is needed…

Now suppose we try to remove that word…

A “hacky” solution would be to make a copy of our TreeSet and iterate over the original to remove K/V pairs. Using iterators would yield us a way more elegant solution.

Iterator Methods

  • hasNext(): Returns true if there are more elements to iterate over.
  • next(): Returns the next element in the iteration. Throws NoSuchElementException if there are no more elements.
  • remove(): Removes the last element returned by next(). Can only be called once per call to next(). Throws an IllegalStateException if next has not yet been called, or remove has already been called after the last call to next.

Note: You can’t insert or modify an element via iterator, only remove!

Collection<String> collection = ...;
Iterator<String> it = collection.iterator();
 
while (it.hasNext()) {
    String element = it.next(); 
    // Get element and advance to next position in iterator. 
    // Exception if no more elements!
    // Do something with element
}

Iterable DS: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/lang/Iterable.html

Iterators and Data Structures

Each data structure has a specific way of iterating through its elements using Iterators:

  • ArrayList, LinkedList: Iterates in the order of indices (0 to size-1).
  • TreeSet: Iterates in the natural sorted order (according to compareTo()).
  • HashSet: Iteration order is not defined.

We’ll primarily focus on TreeSet because its iteration isn’t as straightforward as ArrayList or LinkedList.

Iterator with TreeSet (Example: Output and Intuition)

The images and animation associated with these examples are crucial for understanding how iterators work with TreeSet:

Now this is the same as our usual for-each loop. Now let’s try to remove elements…

it.next() moves to the next position in the tree and removes the current element (the one we just passed through) if remove() is called. This is important because removing an element can modify the TreeSet. The direct removal of an element via .remove(element) will also rebalance the TreeSet, but this process is not shown in the slides.

Incorrect Iterator Usage: ConcurrentModificationException

If you modify a collection directly (e.g. via .remove(element)) while iterating over it using an iterator, you will get a ConcurrentModificationException:

The only way to modify the collection during iteration with Iterator is via the iterator’s remove() method. it.remove() removes the correct element and updates the tree. If words.remove() is used, the tree is changed but it does not notice it and may lead to problems during the next iteration.

Iteration over Map

You can iterate through a Map by obtaining an Iterator over its keySet():

Map<String, Integer> scores = new TreeMap<>(); //...fill as before
Iterator<String> it = scores.keySet().iterator(); // Iterator over keys
while (it.hasNext()) {
    String student = it.next();
    if (scores.get(student) < 50) {
        it.remove();  // Removes the entry from the map
    }
}

Then access values using get().

Modifying Values during Map Iteration

You can modify the values in a Map during iteration (using either iterators or a for-each loop) via the replace method, but not via put since it requires modifying the keySet as well which is not allowed for iterators. The same applies for adding key-value pairs to a map during iteration with .put(k,v). Doing so will also throw ConcurrentModificationException.

Modifying the values directly through the Map does not affect the keyset iteration order nor does it cause a ConcurrentModificationException.

Overview and Comparison of Data Structures

Properties

Cheatsheet

Generics

Generics, introduced in Java 5, are a powerful feature that allows you to write type-safe code that can work with different types without the need for explicit casting. They enhance code reusability and maintainability, by allowing code to operate with various data types without compromising type safety.

Introduction to Generics

Before generics, collections in Java could store objects of any type (Object). This often required explicit downcasting when retrieving elements, which could lead to runtime errors (ClassCastException) if the cast was incorrect. Generics provide compile-time type checking, preventing these runtime cast exceptions.

The Purpose of Generics

The goals of generics are similar to those of generic methods, to make code applicable for various types.

Consider the following example:

//works for every type
System.out.println(x); 
//works for every type of array xs, but needs a seperate method for each primitive type in Java
Arrays.fill(xs, value); 

Now you might think that we have overloading, hence there is no need for any generics.

The problem arises, when we try to create attributes of arbitrary type. There is no way to “overload” these types.

public class Pair{
    E first; // E can be anything
    E second;
    //...
}
 
new Pair(1,2);
new Pair(true, false);
new Pair("links", "rechts");
 

We need a way to specify the type when instantiating an object. Since overloading is not allowed for attributes, we use type parameters.

Creating Generic Classes

Example: Adding Generics to a Class

Let’s look at how to add generics to a simple StringPair class:

Note that the generic type needs to be a reference type (e.g. Integer instead of int, …)

Generic Class Development

Here’s a recommended approach for developing generic classes:

  1. Consider if generics are necessary. Not all classes benefit from being generic.

  2. Develop and test a non-generic version (e.g. using a concrete type like String or Integer), to establish the basic functionality and ensure correctness before introducing generics.

  3. Generalize the class:

    • Replace the concrete type with a type parameter (e.g., E, T, Elem). Type parameter names are single, uppercase letters by convention.
    • Update the code to use the type parameter where appropriate.

Using Generic Classes

Pair<String> p1;  // Declaration
Pair<Integer> p2 = new Pair<Integer>(1, 2);  // Initialization with explicit type arguments
 
Pair<String> p3 = new Pair<>("one", "two");  // Initialization with type inference (diamond operator)

The diamond operator <> (introduced in Java 7) allows the compiler to infer the type arguments based on the context.

Example: Pair with Two Type Parameters

You can have multiple type parameters, each representing a different type.

In this example, Pair stores two elements of potentially different types.

Example: Implementing ArrayList<E>

Recall that ArrayList<E> uses a “growing array” to store elements. The generic implementation mimics the structure of a regular array but adds type safety and dynamic resizing.

  • Goal: To implement a simplified version of ArrayList<E>.
  • Minimal Functionality: We’ll focus on basic methods: ArrayList(), size(), get(int idx), set(int idx, E elem), add(E elem), remove(), and toString().

Implementation Details: Class, Attributes, Constructor

The constructor instantiates an Object array and casts it to E[]. The size attribute tracks number of added elements.

size()
public int size() {
    return size;
}

Returns the number of elements in the list.

get() and set()
public E get(int index) {
    if (index < 0) {
        throw new IllegalArgumentException("Index negative");
    } else if (size <= index) { //index must be smaller than size
        throw new IllegalArgumentException("Index exceeds size");
    }
    return elements[index];
}
 
public void set(int index, E element) {
    if (index < 0) {
        throw new IllegalArgumentException("Index negative");
    } else if (size <= index) {
        throw new IllegalArgumentException("Index exceeds size");
    }
    elements[index] = element;
 
}

The get() and set() methods throw IllegalArgumentException for invalid indices.

Side Note: Documentation

Methods should always be documented, especially when there are preconditions on the inputs (like valid index ranges for get() and set()).

  • Javadoc Format (Industry Standard):

    /**
     * Returns the element at the specified position in this list.
     *
     * @param  index index of the element to return
     * @return the element at the specified position in this list
     * @throws IndexOutOfBoundsException if the index is out of range
     *         (index < 0 || index >= size())
     */
    public E get(int index) { ... }

    Javadoc uses special tags like @param, @return, and @throws to describe the method’s parameters, return value, and potential exceptions.

  • Custom Format: You can also use other formats, such as specifying pre- and postconditions directly in comments.

    // PRE: 0 <= index < size()
    // POST: get(index) == element
    public void set(int index, E element) { ... }

    While not strictly required for this course, documenting your methods is highly recommended. It significantly improves code readability and maintainability.

add()
public void add(E element) {
	// If array is full, double capacity
    if (size == elements.length) {
        grow();
    }
    elements[size] = element;
    size++;
}
 
private void grow() {
    int newCapacity = elements.length * 2;
    E[] newElements = (E[]) new Object[newCapacity];
    // Built-in method for copying arrays
    // or using java.util.Arrays.copyOf()
    System.arraycopy(elements, 0, newElements, 0, size);
    elements = newElements;
}

The add() method appends an element to the end of the list. If the array is full, it calls the private grow() method to double the capacity of the elements array.

System.arraycopy() efficiently copies the elements to the larger array, ensuring no data loss during the expansion. This doubling strategy results in amortized time complexity for additions.

Other growth strategies are possible but are beyond the scope of this course. Overflow cases (where newCapacity calculation could result in a number greater than the maximum array size) are also ignored in this simplified version.

toString()

There are two possible ways of implementing toString(), a simpler version using Arrays.toString() and a more manual but faster and slightly more detailed version:

// Simpler version, good enough:
public String toString(){
    return Arrays.toString(elements);
}
 
// More sophisticated version for this context, better version:
public String toString() {
    if (size == 0) {
        return "[]";
    } else {
        String result = "[" + elements[0];
        
        //Iterates only up to size (not capacity)
        for (int i = 1; i < size; i++) {
            result += ", " + elements[i];
        }
        return result + "]";
    }
}

The better version iterates only up to size, not the full elements.length (capacity). This avoids printing null values for unused positions in the array. Using a StringBuilder for larger lists is even more efficient in practice, but omitted here for simplicity.

remove()

There are two options:

// First option: Just shortening the size variable
public E remove() {
    if (size == 0) {
        throw new NoSuchElementException();
    }
    E last = elements[size - 1];
    size--;
    return last;
}
 
// Better Option: 
// V1 + removing reference to prev last elem
public E remove() {
    if (size == 0) {
        throw new NoSuchElementException();
    }
    size--;
    E last = elements[size];
    elements[size] = null; // Allow garbage collection
    return last;
}

Removes the last element of the list, adjusting size and returning removed element. It throws NoSuchElementException if list is empty.

The second version, in addition to reducing size, explicitly sets the last element to null. This crucial step is necessary for garbage collection. Without it, the removed object might still be reachable through the elements array, preventing the garbage collector from reclaiming the memory, potentially causing memory leaks. Though not highly relevant to introductory programming, understanding this is important for sound memory management.

Ideas for Additional Methods

You could extend this implementation by adding more methods like ArrayList(int capacity), add(int index, E element), remove(int index), indexOf(E element), and remove(E element). Consider the effects of index-based operations on the order and positioning of elements.

Continue here: 24 Type Erasure, Advanced Generics (Covariance, Contravariance, Invariance), Type Bounds, Packages