Lecture from: 23.03.2023 | Video: YT
This lecture, continues our deep dive into Instruction Set Architectures (ISAs) and Microarchitecture. We aim to solidify our understanding of the ISA-Microarchitecture boundary and explore more advanced concepts related to instruction processing and architectural trade-offs. In Lecture 9b (released separately), we will shift focus to Assembly Programming, which will be particularly relevant for your upcoming labs.
Agenda for Today & Next Few Lectures
This lecture and the next few will cover the following topics, building upon the foundations laid in previous lectures:
- The von Neumann Model: A continued discussion to deepen our understanding.
- LC-3: An Example of a von Neumann Machine: Referencing LC-3 as a concrete example to illustrate architectural concepts.
- LC-3 and MIPS Instruction Set Architectures: Further exploration and comparison of these two ISAs.
- LC-3 and MIPS Assembly and Programming: (Covered in Lecture 9b, relevant for labs).
- Introduction to Microarchitecture and Single-Cycle Microarchitecture: Beginning our exploration of microarchitectural design.
- Multi-Cycle Microarchitecture: Expanding on microarchitectural implementations.
What Have We Been Learning? (Recap)
To contextualize today’s lecture, let’s briefly recap what we’ve been learning:
- Basic Elements of a Computer & the Von Neumann Model: Understanding the fundamental components and principles of computer architecture.
- Instruction Set Architectures: LC-3 and MIPS: Examining and comparing two distinct ISAs, focusing on their instructions, data types, and addressing modes.
- Instruction Formats: Analyzing how instructions are encoded in binary.
- Addressing Modes: Exploring different techniques for specifying operand locations.
For reference, the diagram of the LC-3 architecture is displayed again, highlighting its key components and their interconnections within the von Neumann framework.
As a reminder, instructions are categorized into three core types:
- Operate Instructions: Perform computations within the ALU.
- Data Movement Instructions: Handle data transfer between memory and registers.
- Control Flow Instructions: Alter the sequence of program execution.
![image on slide 6 depecting Recall: Instruction Types]
Instruction (Processing) Cycle: Quick Review
Let’s quickly revisit the Instruction (Processing) Cycle, the fundamental execution flow within a Von Neumann machine.
The Cycle’s Phases
The Instruction Cycle consists of six phases:
- FETCH: Retrieve the instruction from memory.
- DECODE: Interpret the instruction and determine control signals.
- EVALUATE ADDRESS: Calculate memory addresses for operands.
- FETCH OPERANDS: Obtain operands from registers or memory.
- EXECUTE: Perform the operation in the ALU.
- STORE RESULT: Write back the result to memory or registers.
Control of the Instruction Cycle via Finite State Machine (FSM)
The Instruction Cycle is managed by a Finite State Machine (FSM) within the Control Unit. The FSM dictates the sequence of operations and control signals for each instruction phase.
LC-3 and MIPS Instruction Set Architectures: Continuing the Discussion
We continue our comparison and analysis of LC-3 and MIPS ISAs, focusing on their instructions and design philosophies.
Instructions (Opcodes): Design Trade-offs
The design of the Instruction Set, particularly the choice of Opcodes, involves significant trade-offs:
- Simple vs. Complex Instructions: Balancing the complexity of individual instructions against the overall instruction count and code density.
- Hardware Complexity vs. Software Complexity: Complex ISAs can simplify software development but increase hardware complexity, and vice-versa.
- Latency Considerations: Balancing the execution speed of simple versus complex instructions.
Data Types and Addressing Modes: Completing the Picture
We will further explore Data Types and Addressing Modes in this lecture, examining their impact on ISA design and microarchitectural implementation.
Data Movement Instructions and Addressing Modes (Deep Dive)
Now, let’s delve deeper into Data Movement Instructions and Addressing Modes, particularly within the LC-3 architecture.
Data Movement Instructions in LC-3: Overview
LC-3 provides seven data movement instructions: LD
, LDR
, LDI
, LEA
, ST
, STR
, and STI
. These instructions facilitate data transfer between memory and registers, with LEA
being an exception (loading an address, not data).
Format of Load and Store Instructions in LC-3
Load and store instructions in LC-3 share a common format:
- Opcode (bits [15:12]): Specifies the load or store operation.
- DR or SR (bits [11:9]): Destination Register (DR) for load operations, Source Register (SR) for store operations.
- Address Generation Bits (bits [8:0]): Used to calculate the memory address, interpreted based on the addressing mode.
Addressing Modes in LC-3: Four Ways to Interpret Address Bits
LC-3 offers four primary addressing modes to interpret the address generation bits:
- PC-Relative Mode: Address is calculated relative to the Program Counter.
- Indirect Mode: Address is fetched from a memory location pointed to by a PC-relative address.
- Base+Offset Mode: Address is calculated by adding an offset to a base address in a register.
- Immediate Mode (for LEA): Loads a calculated PC-relative address directly into a register, without memory access.
In contrast, MIPS simplifies load and store addressing to only Base+offset and Immediate modes.
PC-Relative Addressing Mode (LD and ST Instructions)
- Instructions:
LD
(Load) andST
(Store) utilize PC-Relative addressing. - Address Calculation:
Effective Address = PC + sign-extend(PCoffset9)
(where PC is the incremented PC). - Range Limitation: PC-Relative addressing is limited to a relatively short range around the instruction in memory (+255 to -256 locations).
Indirect Addressing Mode (LDI and STI Instructions)
- Instructions:
LDI
(Load Indirect) andSTI
(Store Indirect) employ Indirect Addressing. - Address Calculation:
Effective Address = Memory[PC + sign-extend(PCoffset9)]
. This involves two memory accesses:- Fetch a pointer from memory using a PC-relative address.
- Use the fetched pointer as the final memory address to access the data.
- Extended Range: Indirect addressing overcomes the range limitation of PC-Relative mode, allowing access to virtually any memory location, as the pointer itself can reside anywhere in memory.
Base+Offset Addressing Mode (LDR and STR Instructions)
- Instructions:
LDR
(Load Register) andSTR
(Store Register) utilize Base+Offset Addressing. - Address Calculation:
Effective Address = BaseR + sign-extend(offset6)
(where BaseR is a register specified in the instruction). - Flexibility: Base+Offset addressing is highly flexible, allowing access to memory locations relative to a dynamically determined base address stored in a register. This is ideal for array and data structure accesses.
Immediate Addressing Mode (LEA Instruction)
- Instruction:
LEA
(Load Effective Address) uses Immediate Addressing Mode. - Operation:
LEA
calculates a PC-relative address (PC + sign-extend(PCoffset9)
) but does not access memory. Instead, it loads the calculated address value directly into the destination register. - Purpose:
LEA
is used for address manipulation, pointer arithmetic, and efficiently loading addresses into registers.
Control Flow Instructions: Conditional Branching (Revisited)
We now shift our focus back to Control Flow Instructions, specifically conditional branches.
Condition Codes in LC-3 (Revisited)
LC-3 uses condition codes (N, Z, P) to implement conditional branching. These single-bit registers are updated whenever a general-purpose register is written, reflecting whether the result was Negative, Zero, or Positive.
Conditional Branches in LC-3 (BRz - Branch if Zero) - Revisited
- Instruction:
BRz
(Branch if Zero) is a conditional branch instruction. - Condition Test:
BRz
checks the Zero (Z) condition code. - Branch Action: If the Z condition code is set (meaning the last operation resulted in zero), the program counter is updated to a PC-relative target address (
PC + sign-extend(PCoffset9)
). Otherwise, execution continues sequentially.
Conditional Branches in MIPS (beq - Branch if Equal) - Revisited
- Instruction:
beq
(Branch if Equal) is a conditional branch in MIPS. - Condition Test:
beq
directly compares the values of two registers (rs and rt). - Branch Action: If the registers are equal (
rs == rt
), the program counter is updated to a PC-relative target address (PC + sign-extend(offset) * 4
).
Branch If Equal in MIPS and LC-3: Trade-off (Revisited)
The comparison of beq
(MIPS) and the LC-3 equivalent (using NOT
, ADD
, BRz
) again underscores the trade-off between ISA complexity and code efficiency. MIPS provides a single, complex instruction (beq
), while LC-3 relies on a sequence of simpler instructions to achieve the same outcome.
What We Learned Today (Summary)
In this lecture, we have:
- Reviewed the Von Neumann Model and its key characteristics.
- Examined the Instruction Cycle in detail, understanding each phase and its purpose.
- Explored Data Movement Instructions and Addressing Modes in LC-3 and MIPS, focusing on PC-Relative, Indirect, Base+Offset, and Immediate addressing.
- Analyzed Control Flow Instructions, specifically conditional branches like
BRz
(LC-3) andbeq
(MIPS). - Reinforced the concept of trade-offs in ISA design, particularly regarding instruction complexity, data types, and addressing modes.
As a reminder, there are extensive resources available to delve deeper into ISAs and microarchitecture, including video lectures and textbook chapters.
Many Different ISAs Over Decades: A Historical Perspective
Over the history of computing, numerous Instruction Set Architectures have emerged, each with its own design philosophy and target applications. Examples of ISAs spanning different eras and paradigms include:
- x86: The dominant ISA in desktop and server computing, known for its complexity and backward compatibility.
- PDP-x (PDP-11): An influential early minicomputer ISA, known for its elegance and efficient design.
- VAX: A complex instruction set computing (CISC) architecture, known for its rich instruction set and support for high-level language constructs.
- IBM 360: A foundational mainframe ISA, designed for broad applicability across commercial and scientific computing.
- CDC 6600: An early supercomputer ISA, optimized for high-performance scientific computing.
- SIMD ISAs (CRAY-1, Connection Machine): Architectures designed for Single Instruction, Multiple Data parallelism, used in vector and massively parallel processing.
- VLIW ISAs (Multiflow, Cydrome, IA-64 (EPIC)): Very Long Instruction Word architectures, aiming to exploit instruction-level parallelism through compiler scheduling.
- PowerPC, POWER: High-performance ISAs developed by IBM, Apple, and Motorola, used in workstations, servers, and embedded systems.
- RISC ISAs (Alpha, MIPS, SPARC, ARM, RISC-V): Reduced Instruction Set Computing architectures, emphasizing simplicity, efficiency, and scalability.
The fundamental differences between these ISAs lie in:
- Instruction Specification and Functionality: How instructions are defined, their operations, and their level of complexity.
- Instruction Complexity, Data Types, Addressing Modes: The trade-offs made in balancing instruction complexity, data type support, and addressing mode richness.
Complex vs. Simple Instructions + Data Types: Trade-offs Revisited
Let’s reiterate the trade-offs associated with complex versus simple instructions and data types in ISA design.
Complex Instructions: Advantages and Disadvantages
-
Advantages:
- Denser Encoding: Smaller code size, better memory utilization, reduced bandwidth requirements, improved cache performance.
- Simpler Compiler: Reduced burden on the compiler for complex instruction sequences.
-
Disadvantages:
- Limited Compiler Optimization: Larger, monolithic instructions restrict the compiler’s ability to perform fine-grained optimizations.
- More Complex Hardware: Increased hardware complexity for decoding, control logic, and execution units.
Simple Instructions: Advantages and Disadvantages
-
Advantages:
- Compiler Optimization Opportunities: Simpler instructions provide compilers with more flexibility for optimization and instruction scheduling.
- Simpler Hardware: Reduced hardware complexity, potentially leading to faster clock speeds, lower power consumption, and easier verification.
-
Disadvantages:
- Less Dense Encoding: Larger code size, increased memory footprint, higher bandwidth requirements.
- More Complex Compiler: Compiler needs to perform more complex instruction scheduling and optimization.
Semantic Gap: Revisited
The concept of the semantic gap is crucial for understanding ISA design choices. It refers to the distance between the high-level abstractions used by programmers and the low-level operations directly executed by the hardware.
- Small Semantic Gap (Complex ISA): ISA is closer to high-level languages, easier for programmers and compilers to use, but harder to implement efficiently in hardware.
- Large Semantic Gap (Simple ISA): ISA is further from high-level languages, requiring more complex software (compilers) to bridge the gap, but potentially easier to implement efficiently in hardware.
How to Change the Semantic Gap Tradeoffs: ISA Translation
One powerful technique to manage the semantic gap trade-off is ISA translation. This involves translating a complex, programmer-visible ISA into a simpler, more hardware-friendly “implementation” ISA that is not directly exposed to programmers.
Software and Hardware Translation Examples
- Rosetta 2 (Software Translator): Apple’s Rosetta 2 translator allows macOS to execute x86-64 applications on ARM-based Apple Silicon. This bridges the ISA gap in software, enabling compatibility without requiring x86 hardware.
- Intel and AMD Processors (Hardware Translator): Modern x86 processors from Intel and AMD internally translate complex x86 instructions into simpler micro-operations (micro-ops) that are executed by the processor’s core. This hardware translation allows them to maintain the complex x86 ISA for software compatibility while using a simpler, more efficient microarchitecture internally.
- NVIDIA Denver (Hardware/Software Translator): NVIDIA’s Denver processors also utilize a combination of hardware and software translation to execute ARM code on a custom microarchitecture.
- Transmeta (Software Translator): Transmeta’s Crusoe processors employed a software-based “Code Morphing” layer to translate x86 instructions to a proprietary VLIW (Very Long Instruction Word) ISA.
ISA-level Tradeoffs: Number of Registers (Revisited)
The number of registers in an ISA is another key design decision with significant trade-offs.
-
Impact of Register Count:
- Encoding Bits: More registers require more bits to encode register addresses within instructions, potentially increasing instruction size.
- Register File Size: Larger register files increase the hardware area, access time, and power consumption.
- (Microarchitectural) Size, Access Time, Power Consumption: Larger register files impact the microarchitecture in terms of size, access speed, and power usage.
-
Large Number of Registers: Advantages and Disadvantages
-
Advantages:
- Improved Register Allocation: Compilers have more registers to work with, leading to better register allocation and optimization, reducing memory access for spilling and restoring register values.
-
Disadvantages:
- Larger Instruction Size: Encoding more registers may increase instruction length.
- Larger Register File Size: Increased hardware cost and complexity for the register file.
-
Detailed Lectures on ISAs & ISA Tradeoffs: Further Exploration
For those seeking a more in-depth understanding of ISA design and trade-offs, links to detailed lectures from Carnegie Mellon University are provided.
The Von Neumann Model/Architecture: Key Properties Revisited
Let’s revisit the two defining properties of the Von Neumann Model:
- Stored Program: The concept of storing both instructions and data in a unified memory.
- Sequential Instruction Processing: The principle of executing instructions in a linear sequence dictated by the program counter.
A key question to ponder is: When is a value in memory interpreted as an instruction versus data?
Recall: The Instruction Cycle - Value Interpretation
The answer lies in the Instruction Cycle. A value fetched from memory is interpreted as an instruction if it is fetched during the FETCH phase of the instruction cycle. Conversely, a value fetched during the FETCH OPERANDS phase is interpreted as data. The context within the instruction cycle determines the interpretation.
The von Neumann Model/Architecture: Recommended Readings
For further exploration of the Von Neumann model, recommended readings are:
- Burks, Goldstein, von Neumann, “Preliminary discussion of the logical design of an electronic computing instrument,” 1946: The original paper outlining the Von Neumann architecture.
- Patt and Patel book, Chapter 4, “The von Neumann Model”: A textbook chapter providing a clear and accessible explanation of the model.
The Von Neumann Model (of a Computer) - Diagram (Final View)
The lecture concludes with a final view of the Von Neumann model diagram, solidifying our understanding of its architecture.
Questioning the Von Neumann Model: Alternatives Exist
The lecture poses a critical question: Is the Von Neumann model the only way to process computer programs?
Answer: No. While dominant, the Von Neumann model is not the only paradigm.
Qualified Answer: It has been the dominant paradigm for decades, but alternative models exist, such as Dataflow Architectures.
The Dataflow Execution Model of a Computer: An Alternative Paradigm
Let’s briefly introduce the Dataflow Execution Model as a fundamentally different approach to computation compared to the Von Neumann model.
Von Neumann vs. Dataflow: Key Differences
- Von Neumann Model (Control Flow): Instructions are fetched and executed in a control flow order dictated by the Program Counter. Execution is sequential unless explicitly altered by control flow instructions.
- Dataflow Model (Data Flow): Instructions are fetched and executed in data flow order (think DAG aka directed acyclic graph), meaning an instruction “fires” (executes) only when all of its required input operands are ready (available). There is no program counter in a pure dataflow architecture. Instruction ordering is determined by data dependencies, leading to inherent parallelism.
More on Dataflow: Nodes and ISA Representation
In a dataflow machine, programs are represented as dataflow graphs, composed of dataflow nodes. A node fires (executes) when all its input data tokens are available. The slide shows a simple representation of a dataflow node and its potential ISA encoding.
Example Dataflow Nodes: Conditional, Relational, Barrier Synch
Examples of different types of dataflow nodes are presented:
- Conditional Node (BR - Branch): Implements conditional branching based on a boolean control input.
- Relational Node (> - Greater Than): Performs a relational comparison and outputs a boolean (TRUE/FALSE) result.
- Barrier Synch Node: A synchronization primitive that waits for all input values to arrive before proceeding, useful for parallel processing coordination.
A Simple Example Dataflow Program: Factorial Calculation
A simple dataflow program for calculating factorial is presented as an illustration. This example demonstrates how data dependencies and node firing govern program execution in a dataflow model.
ISA-level Tradeoff: Program Counter (PC) - Dataflow vs. Control Flow
The fundamental ISA-level trade-off between the Von Neumann and Dataflow models centers around the Program Counter (PC):
- Program Counter (PC in ISA - Von Neumann): Enables control-driven, sequential execution. Instructions are executed when the PC points to them, providing predictable sequential control flow.
- No Program Counter (Dataflow): Enables data-driven, parallel execution. Instructions are executed when their operands are ready, allowing for inherent parallelism and potentially higher performance for data-parallel tasks.
The choice of including or excluding a PC in the ISA has profound implications for programming models, compiler design, performance characteristics, and hardware complexity.
ISA vs. Microarchitecture Level Tradeoff (Revisited)
The control flow vs. data-driven execution trade-off also manifests at the microarchitecture level. While the ISA may expose a sequential, Von Neumann programming model, the underlying microarchitecture can employ dataflow principles internally to enhance performance through parallelism.
- ISA (Programmer’s View): Specifies whether the programmer perceives a sequential (control-flow) or dataflow execution model.
- Microarchitecture (Implementation): Determines how instructions are actually executed. A microarchitecture can implement a Von Neumann ISA using dataflow principles internally, executing instructions out of order as long as the architectural semantics (programmer-visible behavior) are preserved.
Let’s Get Back to the von Neumann Model (and Dataflow Resources)
While Dataflow architectures offer an interesting alternative, the Von Neumann model remains dominant in general-purpose computing today. For those interested in exploring Dataflow in more detail, resources like research papers and video lectures are recommended.
The von Neumann Model: Dominance and Microarchitectural Variations
The Von Neumann model underpins virtually all major ISAs today (x86, ARM, MIPS, SPARC, RISC-V, etc.). However, modern microarchitectures implementing these ISAs often deviate significantly from the strict sequential execution of the Von Neumann model internally to achieve higher performance. Techniques like pipelining, superscalar execution, out-of-order execution, and separate instruction and data caches are common microarchitectural enhancements that break the sequential processing paradigm at the hardware level, while still presenting a sequential abstraction to the programmer at the ISA level.
What is Computer Architecture? Definitions (Revisited)
Let’s revisit the definition of Computer Architecture, considering both ISA and implementation perspectives.
ISA + Implementation Definition
Computer Architecture is the science and art of designing, selecting, and interconnecting hardware components and designing the hardware/software interface to create a computing system that meets functional, performance, energy consumption, cost, and other specific goals. This definition encompasses both the ISA and the microarchitecture.
Traditional (ISA-only) Definition (Gene Amdahl)
The traditional, ISA-centric definition focuses on architecture as “the attributes of a system as seen by the programmer,” emphasizing the conceptual structure and functional behavior as distinct from the implementation details of dataflow, control, logic design, and physical realization. This definition primarily focuses on the ISA as the programmer-visible interface.
ISA vs. Microarchitecture: Key Distinction
- ISA (Instruction Set Architecture): The agreed-upon interface between software and hardware. It defines the instructions, data types, and addressing modes that programmers can use. It is the contract between software and hardware.
- Microarchitecture: The specific implementation of an ISA in hardware. It is hidden from the software and involves the organization of the data path, control logic, and other hardware components to execute the ISA efficiently.
- Microprocessor: The physical chip embodying the ISA, microarchitecture, and underlying circuits. “Architecture” in common usage often refers to the combination of ISA and microarchitecture.
Microarchitecture: Implementation Variations
Many different microarchitectures can implement the same ISA. This allows for innovation and optimization in hardware design without breaking software compatibility. Different microarchitectures implementing the same ISA can vary significantly in performance, power consumption, and cost. Examples of diverse microarchitectural implementations for common ISAs like MIPS, x86, POWER, ARM, and RISC-V are provided.
ISA vs. Microarchitecture: Car Analogy
The car analogy is revisited to further illustrate the ISA-Microarchitecture distinction:
- Gas Pedal (ISA): The programmer-visible interface for “acceleration.” The ISA defines what happens (acceleration) when a certain action is taken (pressing the pedal).
- Engine Internals (Microarchitecture): The underlying implementation of “acceleration” is hidden from the user. The microarchitecture determines how acceleration is achieved (gasoline engine, electric motor, etc.).
Just as car engines can vary greatly while still providing the same basic user interface (gas pedal), microarchitectures can differ significantly while implementing the same ISA.
ISA vs. Microarchitecture: Implementation Flexibility
The key takeaway is that the microarchitecture provides a flexible implementation layer, allowing hardware designers to innovate and optimize performance beneath the stable ISA interface. This separation ensures software compatibility while enabling continuous hardware advancements.
Microarchitecture typically changes and evolves much faster than the ISA, as microarchitectural innovations are crucial for driving performance improvements in each new generation of processors.
ISA: What Does It Specify? (Comprehensive List)
The lecture concludes with a comprehensive list of elements typically specified by an ISA, highlighting the breadth and depth of the ISA specification beyond just instructions and addressing modes.
ISA Manuals: Essential Resources
Finally, ISA manuals are presented as valuable resources for those seeking a deep understanding of specific ISAs like Intel’s x86-64 and ARM’s A64. These manuals provide detailed specifications of instructions, data types, addressing modes, and other architectural features.
Continue here: 09.5 Assembly Programming