02 Java Recap and JVM Overview

Lecture from: 20.02.2024 | Video: Video ETHZ

This lecture provides a recap of Java and the Java Virtual Machine (JVM), focusing on aspects relevant to concurrency and parallelism.

Java

Why Java?

Widely Used: Java is a prevalent programming language in both academia and industry.
Downstream Courses: Many subsequent courses rely on Java.
Abundant Resources: Extensive online and in-print resources are available for learning Java.
Concurrency Support: Java offers sophisticated language features and libraries for concurrent programming.

Overview

Java achieves platform independence through bytecode interpretation.

Platform Independence: Java programs, in theory, can run on any computing device (PC, mobile phones, embedded systems, etc.) with a JVM.
Bytecode: The Java compiler translates source code (.java files) into bytecode (.class files).
JVM Interpretation: The Java Virtual Machine (JVM) interprets and executes the bytecode.

The Java Virtual Machine (JVM)

We’ll explore key components of the JVM and their roles in executing Java programs.

Our focus will be on:

Resolver/Loader: Responsible for loading class files.
Bytecode Verification: Ensures the bytecode adheres to security and type constraints.
Bytecode Interpreter: Executes the bytecode instructions.
JIT Compiler: Optimizes performance by compiling frequently executed bytecode to native machine code.

These components interact with:

Memory Allocators: Manage memory allocation for Java objects.
Garbage Collectors: Reclaim unused memory.
Portability Layer: Provides an abstraction layer for interacting with the underlying operating system.
Native Interface: Enables Java code to interact with native (non-Java) code.

Resolver, Loader

The Resolver/Loader loads class files and sets up their internal memory representation. A crucial question is when this loading occurs.

Eager Loading: The class is loaded as soon as it’s referenced (e.g., when another class that uses it is loaded).
Lazy Loading: The class is loaded only when it’s actually needed (e.g., when an object of that class is created).

Most JVMs use lazy loading to improve startup time and reduce memory usage. However, lazy loading in a concurrent environment introduces complexities. If multiple threads attempt to use a class for the first time concurrently, the JVM must ensure that the class is initialized only once, avoiding race conditions and redundant initialization. Static initialization of a class is a non-trivial operation, and can be done concurrently by many Java threads.

Bytecode Verification

Bytecode verification is a crucial security feature of the JVM. It automatically checks that the bytecode provided to the JVM satisfies specific security and type constraints. This verification is typically performed after the class is loaded but before static initialization.

The verifier checks for:

Type Safety: Ensures that bytecode operations are performed on appropriate data types.
No Illegal Casts: Prevents invalid type conversions.
No Pointer Manipulation: Java bytecode does not allow direct manipulation of memory addresses (unlike C/C++).
Access Restrictions: Enforces access modifiers (e.g., preventing direct access to private methods of other classes).
Control Flow Integrity: Prevents jumps to arbitrary locations within a method.
…and other security checks.

Undecidability: A fundamental limitation is that automated verification is undecidable. This means there’s no algorithm that can perfectly determine, for all possible programs, whether they satisfy the security constraints.

Practical Implications: Consequently, the bytecode verifier may reject valid programs that, in reality, do adhere to the constraints. The design goal is to create a verifier that accepts as many valid programs as possible while maintaining a high level of security.

Bytecode Interpreter

The bytecode interpreter is a program within the JVM that executes the bytecode instructions in the class files. It uses a stack and local variable storage.

Stack-Based Architecture: The JVM is a stack-based abstract machine. Bytecode instructions operate by pushing values onto the stack, performing operations, and popping values from the stack.
Local Variables: Local variables and method parameters are stored in a set of registers. Bytecode instructions load and store values between the stack and these registers.
Method Metadata: The class file specifies the number of stack slots and registers required for each method.
Typed Bytecode: Most JVM bytecodes are typed, meaning they operate on specific data types (e.g., integer, float, object reference).

The bytecode interpreter is inherently relatively slow due to the overhead of pushing and popping values on the stack for every operation.

Just-In-Time Compiler (JIT)

The JIT compiler addresses the performance limitations of the bytecode interpreter. It compiles bytecode to native machine code (e.g., x86, ARM) on-demand, typically when a method is frequently executed (a “hot method”).

Dynamic Compilation: Compilation happens during program execution.
Profiling: The JIT compiler uses profiling data to identify hot methods. Gathering this profiling data can be expensive.
Optimizations: Modern JIT compilers perform numerous optimizations, such as:
- Inlining: Replacing method calls with the method’s body.
- Register Allocation: Optimizing the use of machine registers.
- Dead Code Elimination: Removing code that doesn’t affect the program’s result.
- …and many others.

Memory Allocators

Memory allocators are invoked whenever a Java program creates a new object (using the new keyword). Object allocation in Java triggers the JVM’s memory allocator, which often interacts with the underlying operating system to obtain memory, which it then manages internally. The allocator’s algorithms frequently need to be concurrent because multiple Java threads can allocate memory simultaneously. If the allocator were sequential, it could become a significant bottleneck, leading to substantial pause times.

Garbage Collectors (GC)

Garbage collection (GC) is the process of automatically reclaiming memory that is no longer reachable by the program.

The JVM employs various GC algorithms, many of which are concurrent and parallel, to minimize pauses and efficiently manage memory. GC frees the programmer from manual memory management, preventing common bugs like memory leaks and dangling pointers.

Different GC algorithms (generational, concurrent, parallel, mark-and-sweep, etc.) offer trade-offs in terms of performance characteristics. Concurrent GC algorithms are notoriously difficult to implement correctly. The finalize() method in Java is called (if defined) when the garbage collector reclaims an object, though its use is generally discouraged in modern Java.

Native Interface

The native interface allows Java code to interact with code written in other languages (typically C or C++). When a Java program calls a native method, the JVM must convert the parameters from the JVM’s internal representation (e.g., values on the stack) to the calling convention expected by the native code (e.g., values in machine registers).

The JVM needs to handle the parameter passing and type conversions to ensure correct interaction with the native code on the specific platform. While not a large module, the native interface can be tricky to implement correctly, especially regarding type safety. java.lang.Object contains many native methods (e.g., for starting a thread), for which the JVM provides an internal implementation.

Portability Layer

The portability layer provides an abstraction that allows the JVM to run on different operating systems (Windows, Linux, etc.) and architectures (x86, ARM, etc.). The JVM designer implements a small number of JVM constructs (like synchronization and threading) using the primitives provided by the underlying operating system and architecture.

For example, Java has its own notion of a thread. The portability layer maps this Java thread concept to the specific thread implementation of the underlying operating system, ensuring consistent behavior across different platforms. If you want to port the JVM to a new operating system, you primarily need to implement the portability layer.

Looking Inside the JVM: `javap`

The javap tool is a disassembler that allows you to inspect the bytecode of a compiled Java class. While not examinable, understanding bytecode can provide deeper insights into how Java code executes.

We’ll see later how javap can be helpful for understanding constructs like synchronized and volatile. The meaning of all JVM instructions can be found in the Java Virtual Machine Specification.

Bytecode Example

Let’s examine a simple Java class and its corresponding bytecode.

The Test.class file contains the bytecode representation of the Test.java source code.

This slide shows the bytecode for the class constructor, the pp method, and the static initializer. We can see instructions like aload_0 (push the ‘this’ reference onto the stack), invokespecial (call a special method, in this case, the superclass constructor), iload_1 (push a local integer variable onto the stack), i2d (convert an integer to a double), sipush (push a short integer constant onto the stack), and putstatic (store a value in a static field).

This slide shows the bytecode for the main method. It demonstrates object creation (new), method invocation (invokevirtual, invokestatic), field access (getstatic, putfield), and stack manipulation (dup, dadd, pop). It also shows the invocation of a native method (invokestatic #7).

Java Recap

Now, let’s recap some core Java concepts relevant to concurrency.

Structure of a Java Program

A Java program consists of classes. An executable Java program must have a main method, which contains the statements (commands) to be executed. Methods are named groups of statements.

Keywords

Keywords are reserved identifiers with predefined meanings in Java (e.g., class, public, static, void, int, if, else, for, while, synchronized, volatile).

Creating, Compiling, and Running Programs

This slide illustrates the process: source code is written, compiled into bytecode using javac, and then executed by the JVM using java.

Different Kinds of Errors

Compiler errors: Detected by the compiler (e.g., syntax errors, type errors).
Runtime errors: Occur during program execution (e.g., division by zero, null pointer dereference).
Logic errors: The program runs but produces incorrect results.

Data Types

A type defines a category or set of data values and constrains the operations that can be performed on that data.

Primitive Types in Java

Java has eight primitive types: byte, short, int, long, float, double, char, and boolean. Java distinguishes between integers and real numbers due to differences in internal representation and precision.

Complete list of primitive types:

Type Conversion (Casting)

Java is a strongly typed language. The compiler enforces type compatibility. Explicit type casts are sometimes required to convert between types, potentially with loss of information (e.g., casting a float to an int truncates the decimal part).

Control Flow

The Two-way `if` Statement

The if-else statement allows conditional execution based on a boolean expression.

Nested `if/else`

Nested if/else statements allow for multiple tests and different outcomes. In concurrent programming, it’s generally preferred to have if/else conditions depend on local variables to minimize potential race conditions.

Categories of Loops

Bounded loop: Executes a known number of times (e.g., for loop).
Unbounded loop: The number of iterations is not known in advance (e.g., while loop).

The `while` Loop

The while loop repeatedly executes a block of code as long as a condition is true.

Arrays

An array is an object that stores a collection of values of the same type. Elements are accessed using a 0-based index.

Multidimensional Arrays

Multidimensional arrays are represented as arrays of arrays. Example: int[][] pp = new int[10][20];

Strings

A String object represents a sequence of characters. Unlike most other objects, a String can be created without using the new keyword.

Characters in a string are accessed using 0-based indexes.

Objects

An object encapsulates data (fields) and behavior (methods). Interaction with an object occurs through its methods; the internal data is hidden (encapsulation).

Object References

Arrays and objects use reference semantics. Assigning one object variable to another does not create a copy of the object; it creates another reference to the same object. This is important for both efficiency (avoiding large copies) and sharing data between different parts of a program.

Pass by Reference Value

When an object is passed as a parameter to a method, the object itself is not copied. The parameter refers to the same object as the caller. Therefore, if the method modifies the object, the changes will be visible to the caller.

Static Variables, Constants, and Methods

Static variables: Shared by all instances of a class.
Static methods: Not tied to a specific object.
Static constants: final variables shared by all instances.
Concurrency Issues? Shared mutable state (like static variables) can lead to concurrency issues if accessed by multiple threads without proper synchronization.

Instance Variables, and Methods

Instance variables belong to class instance
Instance methods are invoked by an instance of the class.

Exceptions

Exceptions provide a mechanism for handling errors. A method can throw an exception to its caller.

Language Features vs. Parallelism: Guidelines

Local Variables: Favor local variables over global variables to minimize the scope of shared data and reduce potential race conditions.
Avoid Aliasing: Minimize aliasing (multiple references to the same object) to prevent unexpected updates through seemingly unrelated variables.
Avoid Mutable State: Prefer immutable objects (objects whose state cannot be changed after creation) when possible, especially when shared between threads. Immutable objects eliminate the possibility of concurrent modification issues.
Exceptions and Concurrency: Exceptions can be less effective in parallel programs because the root cause of an error might be in a different thread and far removed from where the exception is caught. Exceptions thrown in a thread don’t automatically propagate to the main thread.

Continue here: 03 Introduction to Threads and Synchronization (Part I)

CS Notes

Explorer