Lecture from: 03.12.2024 | Video: Videos ETHZ
The Java Collections Framework
Interface List
Iterating and Modifying Collections: Caution!
Let us continue with the previous example of intersect
.
Example Intersection
For variant 1 see previous lecture note…
Variant 2 (Modifying a copy)
Notice the crucial i--
after remove()
to adjust the index since the list size changes.
Variant 3 (Using retainAll
)
The retainAll()
method elegantly performs the intersection operation.
Searching in Lists
-
LinkedList
: Searching usingcontains(Object)
is inefficient (linear time complexity, ). -
ArrayList
: Searching withcontains(Object)
orindexOf(Object)
is also inefficient (linear time, ), but if the list is sorted a more efficient search is possible (binary search). -
Sorted
ArrayList
: If anArrayList
is sorted, you can useCollections.binarySearch()
which significantly improves search performance (logarithmic time, ).
Important: Binary search only works correctly on sorted lists. Using binary search on an unsorted list will produce unpredictable results. Binary search although allowed with some extra steps, is not efficient for a LinkedList
due to its non-random access characteristics.
Comparing Elements: compareTo
and Comparable
Comparing elements is essential for sorting and other operations. Java provides mechanisms for establishing a natural ordering for objects.
Equality vs. Ordering
-
Equality (
==
,equals()
): Used to determine if two objects are equal in value. The==
operator compares references, andequals()
should be overridden to compare object content. -
Natural Ordering (
<
,>
,compareTo()
): Used to establish an ordering between objects (less than, greater than, or equal). The<
and>
operators are not applicable to reference types, so Java uses thecompareTo()
method.
The Comparable
Interface
The Comparable<E>
interface defines a single method, compareTo(E other)
, which is used to compare the current object to another object of the same type. Note that the implementation of compareTo
should be consistent with the implementation of equals()
in your custom classes, in particular the sign of compareTo()
should match the boolean return value of equals()
. That is: if e1.compareTo(e2) == 0
, then we want e1.equals(e2) == true
and vice versa. This consistency ensures that collections can function predictably when using compareTo
and equals
.
compareTo()
in the String
Class
The String
class implements Comparable
and provides a natural ordering based on lexicographical (alphabetical) order.
Implementing compareTo()
for Custom Classes
For your own classes, you need to implement the Comparable
interface and provide a compareTo()
method if you want to be able to compare and sort instances of those classes.
Example:
This example shows how to implement compareTo()
for a Person
class by delegating the comparison to the name
attribute (a String
).
Sorting with java.util.Collections
The Collections.sort()
method can be used to sort List
objects that contain elements of a Comparable
type.
The Collections.sort()
method uses the compareTo()
method defined by each object in the list to determine the correct order and modifies the original list.
The Comparable
Interface for Sorting Custom Objects
To sort a list of custom objects using Collections.sort()
, your class must implement the Comparable
interface.
Example:
Since Person
did not implement Comparable
, the Collections.sort()
call would result in a compile-time error. Comparable
enforces a consistent comparison method used for sorting.
Another Example: Sortable 2D Points
The compareTo
method defines an ordering relation on two given points using their x and y coordinates.
Consistency of equals()
and compareTo()
For any class, the compareTo()
method should be consistent with the equals()
method.
-
Consistency Rule: If
e1.compareTo(e2) == 0
, thene1.equals(e2)
should betrue
, and vice-versa. This consistency is essential to maintain predictable behavior for data structures likeTreeSet
, where comparisons withcompareTo
andequals
are involved. -
Exception for
null
:e1.compareTo(null)
should throw aNullPointerException
, bute1.equals(null)
must always befalse
.
Summary: Sorting and Comparable
To use Collections.sort()
on a list of custom objects, your class must implement Comparable
and define a compareTo()
method consistent with equals()
. Typical compareTo
implementations often involve delegation (e.g., comparing based on a String
attribute) or defining comparisons based on multiple attributes.
Interface Set
The Set
interface represents a non-associative collection that does not allow duplicate elements. It extends the Collection
interface. Set
implementations use the equals()
method to determine element equality. The add()
method returns false
if you try to add an element that is already present in the set (according to the equals()
method). Sets are typically unordered, meaning that there’s no guarantee about the sequence in which elements are retrieved during iteration. Note: LinkedHashSet
offers a predictable iteration order, but this is not the focus of this part of the lecture.
Typical Set Operations
Common set operations include intersection, union, and difference:
-
Intersection:
a.retainAll(b)
modifies seta
to contain only the elements that are also present in setb
. -
Union:
a.addAll(b)
adds all elements of setb
to seta
. Note: sinceb
is a set it does not contain duplicate elements. -
Difference:
a.removeAll(b)
removes all elements of setb
from seta
.
Set Implementations
The Java Collections Framework provides several Set
implementations:
-
HashSet
: Uses a hash table for efficient element lookup (on average foradd()
,remove()
, andcontains()
). -
TreeSet
: Stores elements in a sorted order using a balanced binary search tree ( time complexity for most operations). -
LinkedHashSet
: Maintains insertion order during iteration, but we will not cover this now.
TreeSet
: Idea and Example
TreeSet
elements are stored in a balanced binary search tree, which ensures efficient insertion, deletion, and search (). TreeSet
requires an ordering relation between elements. This means the element type must implement the Comparable
interface or you must provide a Comparator
(more on this later). TreeSet
is efficient for ordered set operations.
Example:
Note that a TreeSet
will store its elements in ascending order based on their natural ordering by default. The natural ordering is given by the objects’ compareTo()
methods. In this specific example, the integers would be output in their ascending numerical order.
TreeSet
: Importance of compareTo()
and Runtime Exception
If you use TreeSet
with a custom class that doesn’t implement Comparable
, you’ll get a ClassCastException
at runtime, not a compile-time error. This can be tricky to debug.
Recommendation: Always implement both equals()
and compareTo()
(and hashCode()
, covered later) when designing classes that might be stored in collections like TreeSet
.
TreeSet
: Order-Based Operations
TreeSet
provides methods for order-based operations, exploiting its sorted nature:
descendingSet()
: Returns a view of the set in reverse order.first()
,last()
: Return the smallest and largest element, respectively.lower(E e)
,higher(E e)
: Return the largest element strictly less thane
, and the smallest element strictly greater thane
, respectively.floor(E e)
,ceiling(E e)
: Return the largest element less than or equal toe
, and the smallest element greater than or equal toe
, respectively.headSet(E e)
,tailSet(E e)
: Return a subset containing elements strictly less thane
, and greater than or equal toe
, respectively. There are also versions of these methods which include the elemente
in the subset.
More Info here: https://docs.oracle.com/en/java/javase/21/docs/api/java.base/java/util/TreeSet.html
HashSet
: Idea and Example
HashSet
uses a hash table for efficient element storage and retrieval. Elements do not maintain a specific order. HashSet
requires a properly implemented hashCode()
method (discussed below) rather than compareTo()
, but it’s recommended to implement a proper equals
as well for better control of duplicate handling.
Example:
Hashing and hashCode()
Hashing is a technique used by HashSet
and HashMap
to efficiently store and retrieve elements. A hash function maps an object to an integer value (the hash code). Hash codes are used to determine the “bucket” in the hash table where an element should be stored. This allows for very fast average case lookups (). The hash code must not change over time unless the object is specifically mutated.
-
The
hashCode()
Method: Defined in theObject
class,hashCode()
returns anint
representing the object’s hash code. All Java classes automatically inherit a default implementation ofhashCode()
from Object. -
Overriding
hashCode()
: For custom classes used in hash-based collections, you should overridehashCode()
so that:- Equal objects (according to
equals()
) have the same hash code. - Unequal objects have different hash codes. This minimizes collisions and improves the distribution of elements in the hash table. (But not strictly required)
- Equal objects (according to
-
Ideal
hashCode()
Implementation: Ideally, a hash function maps objects to unique integers. This minimizes collisions and maximizes performance. However, in practice, collisions may still happen, and hash tables need to deal with collision resolution. For hash set this occurs by storing multiple elements which have the same hash value in a list within the corresponding hash bucket. -
Implementing
hashCode()
: The default implementation ofhashCode()
is fine as long as you are comparing objects via their memory addresses. If however, you have overridden theequals()
a method for meaningful equality of your custom objects, you absolutely have to also implement a properhashCode()
method for hash based sets to work correctly! -
Overriding
hashCode()
andequals()
: If you override eitherhashCode()
orequals()
, you should always override the other as well to ensure that these two methods remain consistent.
Example Implementation:
HashSet
Warning: A non-trivial hash function is crucial for the efficiency of a HashSet
. HashSet
is not suitable for queries that rely on order (e.g. finding the minimum element of a set). Such operations are not possible (or very inefficient) as HashSet
elements are not sorted or ordered. Use a TreeSet
for order-related operations or operations where uniqueness is important. For just insertion, removal and checks whether some element is contained, a hash set is a good choice.
Continue here: 22 Java Collection Framework, Interface Set, Interface Map, Comparator Interface