

CS 134: Operating Systems More Memory Management Overview



#### Segmentation Recap

Paging



#### Segmentation (Reca

Logical address consists of the pair <segment-number, offset>

#### Example

Use 32-bit logical address - High-order 8 bits are segment number - Low-order 24 bits are offset within segment 256 segments, of mix size 16,777,216 bytes (16MB)

Logical address consists of the pair

<segment-number, offset>

## Example

Use 32-bit logical address

- High-order 8 bits are segment number
- Low-order 24 bits are offset within segment

256 segments, of max size 16,777,216 bytes (16MB)

### Segment Table on CPU



Processor needs to map 2D user-defined addresses into 1D physical addresses.

In segment table, each entry has:

- Base—Starting address of the segment in physical memory
- Limit—Length of the segment

Segmentation Recap

## **Segment Translation**







#### Segmentation Architecture



| Segmentation Architecture                                                           |                                                                                                   |
|-------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------|
| Belocation     Dynamic     Dynamic table     Sharing     Shared segments            | Class Exercise<br>Do shared segments need to<br>have the same segment<br>number.                  |
| Same segment number     Allocation     First fibbest fit     External fragmentation | <ul> <li>If not, why? (Why might we<br/>give them the same<br/>segment number anyway?)</li> </ul> |

- Relocation
  - Dynamic
  - By segment table
- Sharing
  - Shared segments
  - Same segment number
- Allocation
  - First fit/best fit
  - External fragmentation

## **Class Exercise**

Do shared segments *need* to have the same segment number.

- If so, why?
- If not, why? (Why might we give them the same segment number anyway?)

### Segmentation Architecture



#### egmentation Architecture

#### Class Exercise

Does our segmentation scheme capture the difference between code and data segments? I not, what would we need to fix it?

#### Class Exercise

What if a program wants more contiguous data space than a segment can hold? Is this a problem?

## **Class Exercise**

Does our segmentation scheme capture the *difference* between code and data segments?

If not, what would we need to fix it?

### **Class Exercise**

What if a program wants more contiguous data space than a segment can hold? Is this a problem?

CS34 Paging P

Properties

Paging

- All pages are the same size (e.g., 4K)
- No need for limit registers
- ► No longer reflect program structure
- Physical locations for pages are called page frames

But...



Now have a *lot* of pages.

- ▶ 4K pages & 32-bit logical address
   ⇒ 20-bit page number, 12-bit offset
- > 20-bit page number  $\Rightarrow$  1,048,576 possible pages!
- Too many to remember inside processor

## Sparsely Filled Address Spaces

For example,

- Nothing at address zero (why?)
- Code low down in memory
- Static and heap data after code (room to grow up)
- Stack high up (room to grow down)



| For example,                                              | 100         |
|-----------------------------------------------------------|-------------|
| <ul> <li>Nothing at address zero (why?)</li> </ul>        | 転向          |
| <ul> <li>Code low down in memory</li> </ul>               | 80 -        |
| <ul> <li>Static and heap data after code (room</li> </ul> | to grow up) |
| <ul> <li>Stack high up (room to grow down)</li> </ul>     |             |
|                                                           |             |
|                                                           | 14          |
|                                                           |             |
|                                                           | 8 4         |

Page Frames

Page Table

## Sparsely Filled Address Spaces

For example,

- Nothing at address zero (why?)
- Code low down in memory
- Static and heap data after code (room to grow up)
- Stack high up (room to grow down)
- Kernel really high up





| For | example,<br>Nothing at address zero (why?)<br>Code low down in memory                  |      |
|-----|----------------------------------------------------------------------------------------|------|
|     | Static and heap data after code (room to grow up)<br>Stack high up (room to grow down) |      |
|     | Kennel really high up                                                                  | 1 Ma |
|     |                                                                                        |      |

### Sparsely Filled Address Spaces

For example,

- Nothing at address zero (why?)
- Code low down in memory
- Static and heap data after code (room to grow up)
- Stack high up (room to grow down)
- Kernel really high up

# Solution (?)

Two-level (or three-level) page tables

- 10-bit upper page number (0-1023)
- 10-bit lower page number (0-1023)
- 12-bit offset (0-4095)



|     | example,                                          |
|-----|---------------------------------------------------|
|     | Nothing at address zero (why?)                    |
|     | Code low down in memory                           |
|     | Static and heap data after code (room to grow up) |
|     | Stack high up (room to grow down)                 |
|     | Kernel really high up                             |
| So  | lution (?)                                        |
| Two | level (or three-level) page tables                |
|     | 10-bit upper page number (0-1023)                 |
|     | 10-bit lower page number (0-1023)                 |
|     | 12-bit offset (0-4095)                            |

Page Frames

Page Table



Huh?

## Zero-Level Page Table





#### physical memory (frames)

### Page Table Design Objectives



#### Here's what we want:

- Needs to be in memory
- Size is O(frames)
- Want O(1) performance
- ► Needs to act like a TLB, i.e.,
  - Can be seen as "just a big cache"
  - Maps pages  $\rightarrow$  frames
  - Don't want to have to flush it all the time

## **Inverted Page Tables**



- > One row per physical frame, with *reverse* mapping
- Given virtual address, how to find physical one?
  - Basically a search problem

#### Inverted Page Tables



- One row per physical frame, with reverse mapping
- Given virtual address, how to find physical one?
  - Basically a search problem
  - Hash tables to the rescue!

Question: Is the hash table bigger than the number of frames?

## Hashed (Inverted) Page Tables





-

#### A Question

method or interprocess communication. Some operating systems implement shared memory using shared pages.

Systems that use inverted page tables have difficulty implementing shared memory. Shared memory is usually implemented as multiple virtual addresses (one for each process sharing the memory) that are mapped to one physical address. This standard method cannot be used, however, as there is only one virtual page entry for every physical page, so one physical page cannot have two (or more) shared virtual addresses.

Organizing memory according to pages provides numerous other benefits

#### Operating Systems Concepts, Silberschatz & Galvin

Does this claim make sense?



# **Processors Compared**



|             | Physical addrs | Virtual<br>addrs | TLB Size   | Segments | Pages  | Hashed page tables |
|-------------|----------------|------------------|------------|----------|--------|--------------------|
| Pentium 4   | 36-bit         | 32-bit           | 64         | varied   | 4k, 4M | _                  |
| Opteron     | 40-bit         | 48-bit           | 1088       | varied   | 4k, 4M | —                  |
| Itanium 2   | 50-bit         | 64-bit           | 4 	imes 32 | —        | 4k4G   | —                  |
| PowerPC 604 | 32-bit         | 52-bit           | 256        | < 256MB  | 4k     | Yes                |
| PowerPC 970 | 42-bit         | 64-bit           | 1024       | < 256MB  | 4k     | Yes                |
| UltraSparc  | 36-bit         | 64-bit           | 64         | —        | 8k4M   | Yes                |
| Alpha       | 41-bit         | 64-bit           | 256        | —        | 8k4M   | —                  |
| MIPS R3000  | 32-bit         | 32-bit           | 64         | —        | 4k     | —                  |



Programs do not need all their code all the time...

## Overlays / Dynamic Loading



#### On modern Unix systems • handle = diopes(filecame, mode) • addr = diayn(bandle, aym) • err = diclose(bandle) hereen =

lavs / Dynamic Loadin

dlopen maps a file into the address space and returns an opaque handle.

dlsym looks up a symbol in a dynamically loaded file.

#### On modern Unix systems

- handle = dlopen(filename, mode)
- addr = dlsym(handle, sym)

```
err = dlclose(handle)
```

Issues...?



We now have a memory scheme where

- Programs use *logical addresses*
- Memory sharing is easy

Memory Recap

- Processes are either in memory or swapped out
- Hardware can detect *invalid* memory accesses to trap to the OS

We can already "swap out" whole programs, but can we do better...?

# Demand Paging



#### Need to

- Bring a page into memory only when it is needed.
  - Less I/O needed
  - Less memory needed
  - Faster response
  - More users & processes
- Mark pages not in memory as *invalid* in page table

When program accesses an invalid page, two possibilities...

#### Demand Paging—Hardware Support

Thus,

- Invalid accesses generate a trap
- Need to restart program after the trap
- Must seem like "nothing happened"

#### Example: The C-code for:

--mystack = new\_item;

might be implemented as a single instruction:

mov -(r6),r1

## **Class Exercise**

Why is this instruction potentially problematic?





What needs to happen when a page fault occurs?

## Page Faults



What happens...

- User process accesses invalid memory—traps to OS
- OS saves process state
- OS checks access was actually legal
- Find a free frame
- Read from swap to free frame—I/O wait, process blocked
- Interrupt from disk (I/O complete)—process ready
- Scheduler restarts process—process running
- Adjust page table
- Restore process state
- Return to user code

### Page Faults (cont.)



#### How long? - Disk is slow 5-15 ms is a conservative guess - Main memory takes 5-15 ns - Pago fault is about 1 million lines slower than a regular

memory access Page faults must be rare! (Need locality!)

- How long?
  - Disk is slow
  - ▶ 5–15 ms is a conservative guess
  - Main memory takes 5–15 ns
  - Page fault is about 1 million times slower than a regular memory access
  - Page faults must be rare! (Need locality!)

#### A "Back of an Envelope Calculation"

How often are there page faults?

#### An example from a desktop machine:

- In 14 days
  - 378,110 page-ins
  - Average load < 4%  $\rightarrow$  12 hours actual compute time
  - 8.75 page faults per second average
- 1,000,000,000 memory accesses per second (a guess)
- 43,200,000,000,000 memory accesses in 12 hours
- 1 page-in every 114,252,466 memory accesses
- Using 5 ns for memory, 5 ms for disk:
  - ▶  $t_{avg} = (5,000,000 * 1 + 5 * 114,252,465)/114,252,466$

#### • $t_{avg} = 5.04 \text{ns}$



#### Page Faults (cont.)



What kind of other tricks? Well, for example, debugging and tracing; VM translation; buffer overflow prevention.

Other kinds of page faults:

- > Demand-page executables from their files, not swap device
- Copy-on-write memory—great for fork
- Lazy memory allocation
- Other tricks...