Lecture from: 12.05.2023 | Video: YT

This lecture initiated the memory systems segment of the course, underscoring memory’s critical and often dominant role in computing systems. The session delved into the fundamental organization and technology of memory, explaining the hierarchical structure of modern memory systems and the operational principles of DRAM and SRAM.

The Critical Importance of Memory in Computing Systems

Memory is a fundamental and increasingly critical component of modern computing. Its significance goes far beyond simply storing data; it dictates performance, energy efficiency, reliability, and even security.

Modern systems are dominated by memory and storage hardware. On processor dies, large areas are dedicated to fast on-chip caches (SRAM), and chips are connected to massive off-chip DRAM main memories and slower storage devices (SSDs, HDDs). This trend is driven by the exponential growth of data in applications like AI/ML, genomics, and data analytics, which require processing vast datasets. While compute performance has grown significantly, memory capacity, bandwidth, and latency have not kept pace proportionally, making memory a major bottleneck. Data movement energy is particularly problematic, often consuming orders of magnitude more energy than computation.

Reliability and security are also heavily dependent on memory. Semiconductor manufacturing scaling leads to denser, more vulnerable DRAM cells. Charge leakage and access interference (like the RowHammer effect) can cause silent data corruption (bit flips). RowHammer, for example, allows attackers to induce bit flips in physically adjacent rows by repeatedly accessing a single row, breaking memory isolation and posing a serious security threat by potentially corrupting critical data structures like page tables.

The lecture emphasized that despite decades of effort and significant hardware dedication, memory remains a primary bottleneck, illustrating the complex interplay of performance, energy, and vulnerability trade-offs.

Memory Fundamentals

I really suggest watching this (really really impressive) explanation video on DRAM chips…

From a programmer’s perspective, memory is typically viewed as a simple, large, byte-addressable linear address space, accessible via load and store instructions. This simplified view is enabled by the virtual memory abstraction, a crucial system concept.

Virtual Memory

The programmer operates within a potentially vast virtual address space. The system (hardware, like the Memory Management Unit or MMU, and system software, like the OS) maps segments of this virtual space onto a smaller, finite physical memory (DRAM) and potentially slower backing storage (disk). This mapping is transparent to the programmer, simplifying software development by hiding physical memory limitations and allowing address space isolation between processes. While beneficial for programming, it adds hardware complexity (MMU, page tables) and potential performance overheads (page faults, Translation Lookaside Buffer - TLB misses) that are typically managed by the operating system.

The lecture then transitioned to describing the physical memory system, which implements this storage functionality using hierarchical organization and specific technologies.

Physical Memory System

The physical memory system is organized hierarchically to balance capacity, performance, and cost. It is built from fundamental storage elements (bit cells) arranged in 2D arrays.

As mentioned in the beginning, watch the video mentioned there in order to understand this subject much better with cool 3D visuals.

Memory Array

The basic storage structure is a 2D array of bit cells. Access involves providing an address, split into a row address and a column address. A row decoder selects a specific row (wordline), activating it. All bit cells in the activated row are simultaneously connected to column circuitry (bitlines and sense amplifiers). Readout circuitry, including a column decoder/multiplexer, selects the specific data bits from the activated row buffer based on the column address.

Memory Technologies

  • SRAM (Static Random Access Memory): Fast, non-volatile (while powered), typically uses 6 transistors per bit, configured as cross-coupled inverters with access transistors. Expensive, lower density. Used for fast caches (L1, L2).
  • DRAM (Dynamic Random Access Memory): Slower, uses one transistor and one capacitor per bit. Stores data as charge level. Cheaper, higher density. Requires periodic refresh due to capacitor leakage (reading and restoring charge). Read is destructive (discharges bitline), requiring the activated row’s data to be written back to the array after sensing. Used for main memory (larger capacity).

Other technologies like Phase Change Memory (PCM) and STT-MRAM are being explored for their different characteristics, particularly non-volatility and endurance trade-offs compared to DRAM.

Hierarchy

To build large memories efficiently, smaller arrays are organized hierarchically:

  • Banks: A larger memory is partitioned into independent banks. Each bank is a collection of memory arrays and associated circuitry. Banking allows multiple memory accesses to proceed concurrently or in an interleaved fashion, provided the accesses target different banks. This is crucial for achieving higher memory throughput.
  • Chips: A DRAM chip contains multiple banks (typically 8 or 16). Banks within a chip share external pins for command, address, and data buses to reduce chip cost and pin count.
  • Ranks: Multiple chips (e.g., 8 chips for a 64-bit data bus) are connected in parallel to form a rank. Ranks share command and address buses but provide wider data access by aggregating data from multiple chips (bit slicing).
  • DIMMs (Modules): A physical board containing one or more ranks (e.g., chips on front and back).
  • Channels: Connect one or more DIMMs to the processor’s memory controller on the motherboard. Multiple channels increase the total memory bandwidth available to the processor.

Banks are also physically subdivided into smaller components:

DRAM Bank Operatoin Cycle

Accessing data in a DRAM bank involves a sequence:

  1. ACTIVATE (Row Access): The row address (RAS - Row Address Strobe) is sent, decoded by the row decoder, asserting a wordline. This connects the cells in that row to the sense amplifiers (row buffer). The sense amplifiers sense the small charge perturbation on the bitlines, amplify the data, and latch the entire row into the row buffer. This takes significant time due to the physics of sensing small charges over long bitlines.
  2. READ/WRITE (Column Access): Once a row is active in the row buffer, column addresses (CAS - Column Address Strobe) are sent. The column decoder selects the specified data bits from the row buffer, which are then read out or written into the row buffer. Subsequent accesses to the same activated row (row buffer hit) are much faster, as they only require decoding a new column address and accessing the data already in the row buffer.
  3. PRECHARGE: Before activating a different row in the same bank, the currently active row must be closed. A precharge command is sent, which typically involves writing the data back from the sense amplifiers to the array cells (due to destructive read) and restoring the bitlines to a stable reference voltage. This prepares the bitlines for the next activate command.

The performance of memory access is significantly impacted by the need to perform these sequential steps. Random accesses often incur the full activate-read/write-precharge cycle latency, while accesses to data within the same active row are much faster. Banking and interleaving help overlap these latencies by allowing consecutive accesses to target different banks.

This detailed understanding of memory organization and operation, particularly the behavior of DRAM banks, is crucial for designing effective memory hierarchies (caches) and controllers, which will be the focus of subsequent lectures aimed at mitigating the memory bottleneck.

Continue here: 22 Memory Hierarchy, Caching, Direct-Mapped, Set-Associative, Fully Associative Caches