Lecture from: 16.03.2023 | Video: YT

Timing and Verification

Last Lecture

In the previous lecture, we covered Timing and Verification in digital circuits. We explored:

  • Timing in combinational circuits:
    • Propagation delay and contamination delay.
    • Glitches and how they arise.
  • Timing in sequential circuits:
    • Setup time and hold time requirements for registers.
    • Determining the maximum operating frequency of a circuit based on timing constraints.
  • Circuit Verification & Testing:
    • Methods to ensure a circuit works correctly.
    • Functional verification & testing to check logical correctness.
    • Timing verification & testing to check correctness considering timing.
    • Offline vs. online testing approaches.

Timing

Timing in a Single Sequential Component

The clock cycle time in a sequential circuit is limited by the maximum logic delay within the combinational paths connecting sequential components.

This maximum delay determines the minimum time required for signals to propagate through the combinational logic and be reliably captured by the next register on the clock edge.

Timing in Multiple Sequential Components

When dealing with multiple sequential components, the clock cycle time must accommodate the maximum logic delay across all sequential components in the system.

The system’s clock frequency is thus limited by the slowest combinational path among all sequential components.

A Final Word on Timing

Meeting timing constraints is crucial and can be achieved through principled design. The clock cycle time is determined by the maximum logic delay that the circuit can handle without violating timing constraints. Key design principles to achieve this are:

  • Critical Path Design: Minimize the maximum logic delay. This directly maximizes performance by allowing for a shorter clock cycle.
  • Balanced Design: Balance the maximum logic delays across different parts of the system. This prevents bottlenecks and minimizes wasted time where some parts of the circuit are faster than others, waiting for the slowest part to complete.
  • Bread and Butter Design: Optimize for the common case, the most frequent operations, ensuring that less frequent operations do not negatively impact the overall performance. This maximizes performance for typical workloads.

Verification

Making Sure A Design Works Correctly

Verification and Testing (V&T) are essential to ensure design correctness.

  • Functional Verification & Testing: Ensures the circuit operates logically as intended, producing the correct outputs for given inputs.
  • Timing Verification & Testing: Ensures the circuit operates correctly under timing constraints, considering propagation delays and setup/hold times.

V&T are highly time and resource-consuming, often dominating the manufacturing process. Verifying complex circuits with billions of transistors is extremely challenging. Even with rigorous V&T, errors can still slip into the field, making online verification and testing critical in modern systems.

Recall: Silent Data Corruption In-the-Field and RowHammer

Real-world examples like Silent Data Corruption and RowHammer highlight the critical importance of V&T. Silent Data Corruption demonstrates that errors can occur in manufactured chips due to subtle defects, leading to incorrect computations without any obvious error signals. RowHammer illustrates how hardware vulnerabilities can be exploited for security breaches. These examples underscore the ongoing challenge of building truly robust and reliable computing systems and the necessity for thorough and continuous verification and testing.

Challenge and Opportunity

Building fundamentally reliable, safe, and secure systems remains a significant challenge and a vital opportunity. Verification and Testing, both offline and online, are crucial for addressing this challenge. You will gain a glimpse of V&T in your labs. To prepare for this, it is recommended to watch Lecture 6c (Verification & Testing), which provides an overview of V&T approaches and examples of testing in Verilog.

What Have We Learned So Far?

We have largely completed the “Digital Design” portion of the course, focusing on transistors, logic gates, and sequential circuits. We are now transitioning to the “Computer Architecture” part, moving up the abstraction level.

Agenda for Today & Next Few Lectures

Today’s agenda and the next few lectures will cover:

  • The von Neumann model.
  • LC-3: An example of a von Neumann machine.
  • LC-3 and MIPS Instruction Set Architectures (ISAs).
  • LC-3 and MIPS assembly and programming.
  • Introduction to microarchitecture and single-cycle microarchitecture.
  • Multi-cycle microarchitecture.

What Will We Learn Today?

In this lecture, we will focus on:

  • Basic elements of a computer and the von Neumann model, using LC-3 as an example.
  • Instruction Set Architectures (ISAs), specifically LC-3 and MIPS ISAs, covering:
    • Operate instructions.
    • Data movement instructions.
    • Control instructions.
  • Instruction formats.
  • Addressing modes.

Building a Computing System & The von Neumann Model

We are now moving towards understanding how to build a complete computing system, starting with the fundamental von Neumann model.

What is A Computer?

A computer fundamentally consists of three components: Processing, Memory, and Input/Output (I/O). We will cover all three components in detail.

Building Up to A Basic Computer Model

In previous lectures, we learned to design combinational and sequential logic structures. Using these, we can build essential computer components like execution units, decision units, memory units, and communication units. All these are basic elements of a computer. We are now raising our abstraction level to use these logic structures and construct a basic computer model.

Basic Components of a Computer

To perform a task using a general-purpose computer, we need:

  • A computer program: A set of instructions that specifies what the computer must do.
  • The computer itself: The hardware that carries out the specified task.

A program is a set of instructions, where each instruction is a well-defined piece of work for the computer to perform. The instruction set is the complete set of all possible instructions the computer is designed to execute.

The von Neumann Model

To execute computer programs, we need an execution model. John von Neumann proposed a fundamental model in 1946. The von Neumann Model consists of five components:

  1. Memory: Stores both the program (instructions) and data.
  2. Processing Unit: Performs arithmetic and logical operations.
  3. Input: Allows data and instructions to enter the computer.
  4. Output: Allows results to be displayed or stored.
  5. Control Unit: Controls the sequential order of instruction execution.

Throughout this lecture, we will examine the von Neumann model using LC-3 and MIPS as examples. Almost all general-purpose computers today are based on the von Neumann model.

The von Neumann Model Diagram

The diagram illustrates the components of the von Neumann model and their interconnections:

  • Memory: Contains Memory Address Register (MAR) and Memory Data Register (MDR).
  • Processing Unit: Contains Arithmetic Logic Unit (ALU) and TEMP (temporary storage - registers).
  • Control Unit: Contains Program Counter (PC) or Instruction Pointer (IP) and Instruction Register (Inst Register).
  • Input: Keyboard, Mouse, Disk…
  • Output: Monitor, Printer, Disk…

Memory

Memory in the von Neumann model stores both programs and data. It contains bits, logically grouped into bytes (8 bits) and words (e.g., 8, 16, 32 bits). The address space is the total number of uniquely identifiable locations in memory. Addressability refers to the number of bits stored in each memory location (e.g., byte-addressable, word-addressable).

Recall the structure of a memory array, which we discussed in earlier lectures.

Memory Details

  • Memory stores: Programs and data.
  • Memory contains: Bits, grouped into bytes and words.
  • Address space: Total addressable locations (e.g., LC-3: 216, MIPS: 232, x86-64: up to 248).
  • Addressability: Bits stored per location (e.g., byte-addressable).

A Simple Example Memory

A simple memory example with 8 locations, each storing 8 bits (one byte). This memory is byte-addressable with an address space of 8. Value 6 is stored at address 4, and value 4 is stored at address 6.

Word-Addressable Memory

In word-addressable memory, each data word has a unique address. In MIPS, each 32-bit word has a unique address, and in LC-3, each 16-bit word has a unique address. The below shows an example of word-addressable MIPS memory.

Byte-Addressable Memory

In byte-addressable memory, each byte has a unique address. MIPS is actually byte-addressable, as is LC-3b (an updated version of LC-3). The diagram shows an example of byte-addressable MIPS memory, where each byte within a word has a distinct address.

Question: How are these four bytes ordered within a word? Which byte is the most significant versus least significant?

Accessing Memory: MAR and MDR

Memory access involves two primary operations:

  1. Reading (loading): Retrieving data from a memory location.
  2. Writing (storing): Placing data into a memory location.

Two registers are typically used for memory access:

  • Memory Address Register (MAR): Holds the address of the memory location to be accessed.
  • Memory Data Register (MDR): Holds the data being read from or written to memory.

To Read:

  1. Load the MAR with the address to be read.
  2. The data from the corresponding memory location is placed into the MDR.

To Write:

  1. Load the MAR with the address and the MDR with the data to be written.
  2. Activate the Write Enable signal. This causes the value in the MDR to be written to the address specified by the MAR.

Endianness

Big Endian vs. Little Endian

The ordering of bytes within a word is defined by the endianness of the architecture.

  • Big Endian: The most significant byte (MSB) is stored at the lowest byte address, and the least significant byte (LSB) is at the highest byte address.
  • Little Endian: The least significant byte (LSB) is stored at the lowest byte address, and the most significant byte (MSB) is at the highest byte address.

The terminology “Big Endian” and “Little Endian” is derived from Jonathan Swift’s “Gulliver’s Travels.”

Big Endian vs. Little Endian Diagram

The diagrams visually represent the byte ordering in Big Endian and Little Endian systems.

Question: Does endianness really matter? Answer: No, it is just a convention, as long as you are consistent within a system. Qualified Answer: No, except when big-endian and little-endian systems need to share or exchange data. In such cases, endianness must be considered to ensure correct data interpretation.

Processing Unit

The Processing Unit is responsible for performing the actual computations within the computer.

The processing unit often consists of multiple functional units, with the Arithmetic and Logic Unit (ALU) being a core component. The ALU executes computation and logic operations. Examples of operations include:

  • LC-3: ADD, AND, NOT (XOR in LC-3b).
  • MIPS: add, sub, mult, and, nor, sll, slr, slt…

The ALU processes words, which are the fundamental data quantities. The word length varies across architectures (LC-3: 16 bits, MIPS: 32 bits).

ALU (Arithmetic Logic Unit)

The Arithmetic Logic Unit (ALU) combines a variety of arithmetic and logical operations into a single functional unit. It performs only one function at a time, selected by control signals. ALUs are typically represented by a specific symbol.

The below shows an example of a simple ALU implementation, illustrating how different operations can be selected and performed.

Registers

To enhance performance, processing units include a small amount of fast temporary storage located very close to the ALU. This storage is used to hold intermediate values and data that needs to be accessed quickly. This temporary storage is implemented using registers.

  • Memory is large but slow.
  • Registers within the Processing Unit offer fast access to values that are being actively processed by the ALU. Typically, one register stores one word.
  • Register Set or Register File is a collection of registers that instructions can directly manipulate.
    • LC-3 has 8 general-purpose registers (GPRs), named R0 to R7 (3-bit register number), each holding 16 bits (word length).
    • MIPS has 32 general-purpose registers, named R0 to R31 (5-bit register number), each holding 32 bits (word length).

Recall the register structure, which we discussed in earlier lectures. Registers are built using D latches to store data.

A simplified symbol for a register.

Registers are implemented using multiple parallel D flip-flops, each storing one bit.

MIPS Register File (Conventions)

MIPS defines conventions for register usage, assigning specific roles to different registers, such as for constants, assembler temporaries, function arguments, return values, saved variables, and pointers.

Input/Output

Input and Output (I/O) devices are peripherals that allow information to enter and leave the computer. Common input devices include keyboards, mice, and scanners. Common output devices include monitors and printers. Disks can function as both input and output devices. In LC-3, we primarily consider the keyboard (input) and monitor (output).

Control Unit

The Control Unit acts as the “conductor of an orchestra,” orchestrating the execution of instructions within the computer.

The Control Unit is responsible for:

  • Conducting the step-by-step process of executing each instruction in a program in sequence.
  • Keeping track of the current instruction being processed using the Instruction Register (IR), which holds the current instruction.
  • Keeping track of the next instruction to be processed using the Program Counter (PC) or Instruction Pointer (IP), which holds the memory address of the next instruction.

Programmer Visible (Architectural) State

The programmer-visible state (or architectural state) comprises:

  • Registers: Fast temporary storage, accessed by name in the ISA (general-purpose or special-purpose).
  • Memory: Array of storage locations indexed by addresses.
  • Program Counter: Memory address of the current (or next) instruction.

Instructions and programs specify how to transform the values of this programmer-visible state.

Stored Program & Sequential Execution

The von Neumann model is also known as the stored program computer. It has two key properties:

  1. Stored program:

    • Instructions are stored in a linear memory array, just like data.
    • Memory is unified for both instructions and data (no distinction in memory itself).
    • The interpretation of a stored value as either instruction or data depends on the control signals and the stage of instruction processing.
  2. Sequential instruction processing:

    • One instruction is processed (fetched, executed, completed) at a time.
    • The program counter (instruction pointer) identifies the current instruction.
    • The program counter is advanced sequentially (incremented) except for control transfer instructions (like jumps or branches).

LC-3: A von Neumann Machine

LC-3 is a simplified example of a von Neumann machine that we will use throughout this course to understand computer architecture principles.

A detailed diagram of the LC-3 architecture, illustrating the five components and their interconnections.

  • Instructions and data are stored in memory, typically with the instruction length equal to the word length.
  • The processor fetches instructions sequentially from memory, decodes and executes them, and then proceeds to the next instruction.
  • The address of the current instruction is stored in the Program Counter (PC).
    • In word-addressable memory, the PC is incremented by 1 (LC-3).
    • In byte-addressable memory, the PC is incremented by the instruction length in bytes (MIPS - by 4).
    • The operating system (OS) typically sets the PC to a starting address (e.g., 0x00400000 in MIPS) at program startup.

An example of a simple MIPS program stored in memory, showing both assembly code and the corresponding machine code in memory.

The Instruction

An instruction is the most basic unit of computer processing. Instructions are the “words” of the computer’s language, and the Instruction Set Architecture (ISA) is the vocabulary. The computer’s language can be expressed in:

  • Machine language: Computer-readable representation (0s and 1s).
  • Assembly language: Human-readable representation.

We will study LC-3 and MIPS instructions, noting that the underlying principles are similar across various ISAs (x86, ARM, RISC-V).

Opcode & Operands

An instruction is composed of two main parts:

  • Opcode: Specifies what the instruction does (the operation).
  • Operands: Specify who the instruction acts upon (the data or registers involved).

Both opcode and operands are defined within the instruction format (or encoding). For example, in LC-3, a 16-bit instruction has bits [15:12] for the opcode and bits [11:0] for operand information.

Instruction Types

There are three main types of instructions:

  1. Operate instructions: Execute operations within the ALU (e.g., arithmetic and logical operations).
  2. Data movement instructions: Read data from or write data to memory.
  3. Control flow instructions: Change the sequence of program execution (e.g., branches, jumps).

We will begin by examining operate instructions.

An Example Operate Instruction (Addition)

Consider the addition operation. In high-level code, it might be represented as a = b + c;. In assembly language, it could be add a, b, c.

  • add: Mnemonic for the addition operation.
  • b, c: Source operands (values to be added).
  • a: Destination operand (where the result is stored).

The instruction’s semantics can be represented as a <-- b + c.

Registers Mapping

Variables in high-level code are mapped to registers in assembly language. For example:

  • LC-3 registers: b mapped to R1, c to R2, a to R0.
  • MIPS registers: b mapped to s2, a to $s0.

Example in LC-3

From Assembly to Machine Code in LC-3 (Addition)

Let’s translate the LC-3 assembly instruction ADD R0, R1, R2 into machine code.

  • LC-3 Assembly: ADD R0, R1, R2
  • Field Values: OP = 1, DR = 0, SR1 = 1, SR2 = 2 (using binary representations).
  • Machine Code (Instruction Encoding): 0001 000 001 000 010 (binary), 0x1042 (hexadecimal).

Instruction Format (or Encoding) - LC-3 Operate

The LC-3 Operate Instruction Format (Register OP Register) is defined as:

  • OP (4 bits): Opcode (e.g., ADD = 0001, AND = 0101). Semantics: DR <-- SR1 + SR2 (for ADD), DR <-- SR1 AND SR2 (for AND).
  • DR (3 bits): Destination register.
  • SR1, SR2 (3 bits each): Source registers.

Example in MIPS

From Assembly to Machine Code in MIPS (Addition)

Let’s translate the MIPS assembly instruction add $s0, $s1, $s2 into machine code.

  • MIPS Assembly: add $s0, $s1, $s2
  • Field Values: op = 0, rs = 17 ($s1), rt = 18 ($s2), rd = 16 ($s0), shamt = 0, funct = 32 (add function code).
  • Machine Code (Instruction Encoding): 000000 10001 10010 10000 00000 100000 (binary), 0x02328020 (hexadecimal).

The semantics is rd <-- rs + rt.

Instruction Format: R-Type in MIPS

The MIPS R-type Instruction Format is defined as:

  • op (6 bits): Opcode (always 0 for R-type).
  • rs, rt (5 bits each): Source registers.
  • rd (5 bits): Destination register.
  • shamt (5 bits): Shift amount (for shift operations).
  • funct (6 bits): Function code (specifies the specific operation for R-type instructions).

Reading Operands from Memory

Besides operate instructions, we need data movement instructions to access operands from memory. This involves:

  • Loading operands from memory into registers.
  • Storing results from registers back to memory.

Next, we will focus on how to read (load) data from memory. Writing (storing) will be discussed later.

Reading Word-Addressable Memory (Load Word)

The Load Word instruction is used to read data from memory.

  • High-level code: a = A[i];
  • Assembly: load a, A, i
  • Mnemonic: load indicates the load word operation.
  • Operands:
    • A: Base address of the array.
    • i: Offset from the base address (immediate or literal).
    • a: Destination operand (register to load data into).
  • Semantics: a <-- Memory[A + i]

Load Word in LC-3 and MIPS

Examples of Load Word instructions in LC-3 and MIPS assembly:

  • LC-3 Assembly: LDR R3, R0, #2 (Load Register). Semantics: R3 <-- Memory[R0 + 2]
  • MIPS Assembly (Word-Addressable): lw $s3, 2($s0) (Load Word). Semantics: $s3 <-- Memory[$s0 + 2]

These instructions use the base+offset addressing mode, where the memory address is calculated by adding a base address and an offset.

Load Word in Byte-Addressable MIPS

In byte-addressable MIPS, the offset in load word instructions is scaled by the word size (4 bytes) to account for byte addressing.

  • MIPS Assembly (Byte-Addressable): lw $s3, 8($s0) (Load Word). Semantics: $s3 <-- Memory[$s0 + 8]

The byte address is calculated as word_address * bytes/word. In MIPS, there are 4 bytes per word.

Instruction Format With Immediate

Instruction formats for Load Word with immediate offset in LC-3 and MIPS:

  • LC-3 (LDR):

  • MIPS (lw):

This marks the point where the lecture paused. We will continue with the instruction processing cycle and more details on instruction set architectures in the next lecture.

Continue here: 08 Instruction Set Architectures II