Caches: Notes and Things to Try
These notes accompany the cache simulator. Open it alongside this page and try the experiments below.
The Big Idea
Memory is slow. Caches are small, fast copies of recently-used data that sit between the CPU and main memory. When the CPU needs a byte, it checks the cache first. If the data is there (hit), great — fast. If not (miss), it fetches a whole block of bytes from memory into the cache, hoping nearby data will be needed soon.
The question that drives everything: which data lives in which part of the cache?
Address Decomposition
Every memory address is split into three fields:
| Field | Bits | Purpose |
|---|---|---|
| Tag | high bits | Identifies which block is stored in this cache line |
| Set index | middle bits | Selects which set to look in |
| Block offset | low bits | Selects a byte within the block |
The simulator color-codes these: tag │ set │ offset. Watch how different addresses land in different sets.
Cache Parameters
- S = 2^s — number of sets
- E — lines per set (associativity)
- B = 2^b — block size in bytes
Total cache size = S × E × B bytes.
A direct-mapped cache has E=1 (one line per set). A fully associative cache has S=1 (one set holding all lines). Everything in between is E-way set associative.
Things to Try
1. Spatial locality in action. Select Row access: sum += a[0][i] with the default cache (s=2, E=1, b=4). Step through. Notice the pattern: miss, hit, hit, hit, miss, hit, hit, hit... Each miss loads a 16-byte block (four ints). The next three accesses land inside that same block for free. That's spatial locality — accessing nearby addresses benefits from the block that was already fetched.
2. When spatial locality fails. Switch to Column access: sum += a[i][0]. Same cache, same array — but now the stride jumps a whole row (32 bytes). Every access lands in a different block. Miss rate: 100%.
3. The conflict miss. Select Two-array conflict with the direct-mapped default. Arrays a and b are placed so they map to the same cache sets. Every access to b evicts a's block, and vice versa — 100% miss rate even though there was room in the cache overall. Now change E from 1 to 2. Miss rate drops to 25%. Same total cache size, just organized differently.
4. Row-major vs. column-major. Compare Row-major: a[i][j] (N=8) and Column-major (N=8). Row-major: 25%. Column-major: 100%. Same data, same cache, different loop order. This is why loop order matters in C.
5. Writes and dirty bits. Select Init: a[0][i] = i with write-back. Step through and watch dirty bits (D) appear in the cache. Now switch to write-through — the dirty bits vanish because every write goes straight to memory. Switch to no-write-allocate — the cache stays empty because write misses skip the cache entirely.
6. Read-modify-write. Select Read-modify-write: a[0][i] = 2*. Each iteration has two accesses: a read (which may miss) and a write (which always hits, because the read just loaded the block). Watch the memory values double as writes land.
Types of Cache Miss
Cold (compulsory) — First time this block has ever been accessed. Unavoidable. The simulator marks these in blue.
Conflict — The block was here before, but got evicted because another block mapped to the same set, even though the cache had free space elsewhere. Marked in yellow. Increasing associativity (E) reduces conflict misses.
Capacity — The cache is completely full and something had to go. Marked in red. Only a bigger cache helps.
Write Policies
Write-back — Writes update only the cache. The modified line is marked dirty. Memory is updated later, when the dirty line gets evicted. Faster (fewer memory writes) but more complex.
Write-through — Writes update both the cache and memory immediately. Simpler, but generates more memory traffic.
Write-allocate — On a write miss, fetch the block into the cache, then write. Usually paired with write-back.
No-write-allocate — On a write miss, write directly to memory without bringing the block into the cache. Usually paired with write-through.
Replacement Policy
When a set is full and a new block must be loaded, the cache must choose which existing line to evict. LRU (Least Recently Used) evicts the line that hasn't been accessed for the longest time. In the simulator, the ↻ number on each line shows how many accesses ago it was last touched — the highest number gets evicted next.
Key Formulas
Miss rate = misses / (hits + misses)
Average memory access time = hit time + miss rate × miss penalty
A typical hit time might be 1 cycle; a miss penalty might be 100 cycles. Even a small miss rate is expensive: 5% miss rate → 1 + 0.05 × 100 = 6 cycles average.
(When logged in, completion status appears here.)