CS 134

Named Segments vs. Logical Addressing

In the last lesson, we introduced the concept of segmentation and mentioned how it was implemented on the x86 CPU, with a small number of named segments (CS, DS, SS, ES, FS, and GS), and special prefixes in instructions to override the default segment registers.

Let's see why this might not be such a great idea.

Example: Segment-Aware memcpy

Suppose we want to provide a memcpy function that can copy data from one segment to another. Because addresses don't intrinsically include information about which segment they're in, we need to pass the segment information as additional arguments to the function, so the interface looks like

enum segment_dest_t {
    SEG_DS = 0,
    SEG_ES = 1
};

void seg_memcpy(void *src, void *dest, size_t size,
                enum segment_dest_t src_seg, enum segment_dest_t dest_seg);

and calling the function from C code would look like

     seg_memcpy(food, plate, meal_size, SEG_DS, SEG_ES);

Because the C language doesn't have built-in support for segment specifiers, we have to write the function in assembly language. Here's the code:

# Segment-aware memcopy function
# Arguments:
#   4(%esp)  - source address
#   8(%esp)  - destination address
#   12(%esp) - size
#   16(%esp) - source segment (0 for DS, 1 for ES)
#   20(%esp) - dest segment (0 for DS, 1 for ES)

.text
.globl seg_memcpy
seg_memcpy:
    pushl %ebp
    movl %esp, %ebp
    pushl %esi
    pushl %edi
    pushl %ecx

    movl 8(%ebp), %esi    # source address
    movl 12(%ebp), %edi   # destination address
    movl 16(%ebp), %ecx   # size

    # Check source and destination segments
    movl 20(%ebp), %eax   # source segment
    orl 24(%ebp), %eax    # combine with dest segment
    jmp *jump_table(,%eax,4)

jump_table:
    .long copy_ds_ds
    .long copy_es_ds
    .long copy_ds_es
    .long copy_es_es

copy_ds_ds:
    # Both source and destination use DS (default)
    rep movsb
    jmp done

copy_es_ds:
    # Source uses ES, destination uses DS
.loop_es_ds:
    movb %es:(%esi), %al
    movb %al, (%edi)
    incl %esi
    incl %edi
    decl %ecx
    jnz .loop_es_ds
    jmp done

copy_ds_es:
    # Source uses DS, destination uses ES
.loop_ds_es:
    movb (%esi), %al
    movb %al, %es:(%edi)
    incl %esi
    incl %edi
    decl %ecx
    jnz .loop_ds_es
    jmp done

copy_es_es:
    # Both source and destination use ES
.loop_es_es:
    movb %es:(%esi), %al
    movb %al, %es:(%edi)
    incl %esi
    incl %edi
    decl %ecx
    jnz .loop_es_es

done:
    popl %ecx
    popl %edi
    popl %esi
    popl %ebp
    ret
  • Hedgehog speaking

    Gah! I don't speak assembly!

  • PinkRobot speaking

    We're not asking you to write assembly, just look over it and get a high-level understanding of what it's doing. We'll outline the important aspects below.

The assembly code contains four copy loops:

  • copy_ds_ds: Both source and destination use the DS segment.
  • copy_es_ds: Source uses the ES segment, destination uses the DS segment—the %es:(%esi) syntax creates a machine instruction that uses the ES segment for the source address.
  • copy_ds_es: Source uses the DS segment, destination uses the ES segment—the %es:(%edi) syntax creates a machine instruction that uses the ES segment for the destination address.
  • copy_es_es: Both source and destination use the ES segment.

Based on the values passed in for the source and destination segments, the code jumps to the appropriate copy loop.

Suppose we wanted to support specifying any of the six segment registers for the source and destination. Assuming we coded it the same way as above, how many copy loops would we need?

More broadly, what is the fundamental design error with segments on the x86 architecture? (Hint: It's not with the concept of segments; it's with how we need to use them.)

  • Dog speaking

    Okay, so I see that it's got four loops and it'd be 36 loops if we wanted to support all six segment registers (6 × 6 = 36). But I'm not sure what the fundamental design error is. I mean, it seems like it's working, right?

  • Cat speaking

    I think I see it. The fundamental design error is that the segment specification is in the code—the instructions—not in the data. If the segment was part of the address, we wouldn't need all these loops.

  • Horse speaking

    Whoa! How could the segment be part of the address?

Imagining a Different World

Let's head back in time to 1981. Intel is designing the 80286, set to launch in 1982. It's going to have 32-bit addressing, a massive leap forward compared to the 16-bit addressing of the 8086. There are lots of things they're going to change to support 32-bit addressing, one of which is considering how to handle segments. The original CPU has four named segments (CS, DS, SS, and ES), and it seems like a good idea to have some more, which is what they did. But let's imagine they had made some different decisions about segments.

Suppose they'd decided to have eight segments. But instead of naming them, they just numbered them and made them part of the logical address. Specifically, they used the top 3 bits of the address to specify the segment, and the remaining 29 bits to specify the offset within the segment so that every address intrinsically says what segment it belongs to.

Our hardware now looks like

Processor s d l s 0 b 0 b m segment table physical address logical address physical memory + base limit b < d < l l TRAP
Segment Translation
  • Duck speaking

    29 bits isn't much. That's only 512 MB per segment.

  • PinkRobot speaking

    Remember, we're in 1982. At that time 512 MB was a massive amount of memory. Physically, the 80286 could only address 16 MB of RAM, so it wasn't going to be a worry for a while.

  • Duck speaking

    Okay, but they're kicking the can down the road. That check is going to come due eventually.

  • PinkRobot speaking

    Let's think about that…

512 MB Isn't Enough for Everyone

Let's suppose that some people in the press begin to suggest that soon 512 MB won't be enough memory given how fast the field is advancing, and speculate about some programs that might need a full gigabyte of data. Is our (imagined) design going to be able to handle that requirement?

Is there a trick you can come up with—without changing anything about our CPU design—that would allow a program to access 800 MB of contiguous (logical) memory? (Hint: Just because the segments are 512 MB doesn't mean you can't have more than one segment.)

  • Duck speaking

    I see! If we use two adjacent segments, we can access 800 MB of contiguous memory. The CPU will automatically switch to the other segment when we cross the boundary.

  • Dog speaking

    Wait, why? Do we need some extra trick to make that happen?

  • Duck speaking

    No, it's just the way binary arithmetic works.

Let's imagine we use segments four and five to hold our data, with segment four holding its maximum 512 MB and segment five holding the remaining 288 MB. Here's how we'd divide up the memory:

Where Logical Address Segment Offset
Start of segment four 10000000000000000000000000000000 4 0x00000000
End of segment four 10011111111111111111111111111111 4 0x1fffffff
Start of segment five 10100000000000000000000000000000 5 0x00000000
End of segment five 10110001111111111111111111111111 5 0x11ffffff

Notice that segment five starts immediately after segment four ends, and thus we have a contiguous range of logical addresses from 10000000000000000000000000000000 to 10110001111111111111111111111111, which is 800 MB.

  • Hedgehog speaking

    So it doesn't matter if segments are small. If they all stand together, they can be as big as needed. It's like a metaphor for life.

  • PinkRobot speaking

    Yes.

If the designers were willing to have a bigger table of base and limit registers we could have other designs, such as

  • 8 bits for segment number and 24 bits for offset, giving 256 segments of 16 MB each.
  • 16 bits for segment number and 16 bits for offset, giving 64K segments of 64 KB each.
  • 20 bits for segment number and 12 bits for offset, giving over a million segments of 4 KB each.

All of these would work, using the adjacent-segments trick to get contiguous memory. It doesn't matter how small they are if we can just put them together next to each other.

  • Horse speaking

    Whoa! That last one is a lot of segments. I don't think anyone was going to do that on a chip in the mid 1980s. And why would we need a million segments anyway?

  • Cat speaking

    And they wouldn't correspond to conceptual memory regions, like the code, the heap, the stack, and so on.

  • Pig speaking

    When we put a bunch of segments together to make a bigger memory area, most of the segments will be FULL, with only the one at the very end of a chain of adjacent segments being partially used.

  • Goat speaking

    Meh, so it's hardly even worth having the limit registers. Throw them away and call it good.

  • PinkRobot speaking

    Actually, that's a good idea.

  • Hedgehog speaking

    What? Why?

Tiny Fixed-Size Segments; Farewell to Limit Registers

If our segments were really small, like 4K, we could just have every segment be at its maximum size and not need limit registers at all. It also means that every allocation of physical memory for a segment will be the same size, 4K. If all allocations are the same size, we don't have to worry about external fragmentation, because every hole will be a multiple of 4K.

  • Cat speaking

    And internal fragmentation will be modest. For any area of memory we're chaining segments together to make, it'll just average out to half a segment per chain, or 2K.

Of course, whereas before our segments coresponded to conceptual regions of a process's memory (e.g., the code, the stack, the heap, etc.), now they don't. They're now little pieces that together get chained together to make those conceptual regions. So perhaps we shouldn't call them segments anymore. Let's call them pages.

  • Horse speaking

    Whoa! I remember pages from CS 105. I never thought this was where we'd end up.

  • Dog speaking

    Mind blown!! 🤯

  • Hedgehog speaking

    It's cool that it's like a spectrum from one thing to the other, but I'm a bit concerned. Sure, we got rid of the limit registers, but we've still got like a million base addresses to keep track of. And that's on a 32-bit system. With 64-bits, we're talking four quadrillion base addresses. That's way too many.

  • Goat speaking

    Meh. Just give up then.

  • PinkRobot speaking

    Actually, yes. We won't store all the information about where every single page is inside the CPU. We'll keep it in memory in some kind of data structure, which we'll call a page table. And that's what we'll discuss on the next lesson page.

  • Goat speaking

    Meh. Gotta be better than 80286 fantasy land.

Deeper Dive: PDP-11 Segmentation — Orgin of the Segfault

The earliest version of Unix ran on the PDP-7, which didn't have any kind of memory-protection support. The approach the designers used was mentioned last class—a single process in memory at a time with swapping to allow multiple processes to run. That version was a very preliminary prototype, where running the game Space Travel was perhaps the most important design consideration.

The first real version of Unix (First Edition Unix, a.k.a. V1), debuted in 1971 on the PDP-11, which added a memory-management unit. In fact, it used segmentation, and used the top three bits of the logical address to specify the segment, exactly like our fantasy 80286. Here's an image from the design document for the PDP-11/20 that describes how it worked:

PDP-11 Segmentation Design
PDP-11 Segmentation Design
  • Horse speaking

    Whoa! The PDP-11 logical addresses were only 16 bits. That's not much.

  • PinkRobot speaking

    Yet Unix managed to run on that.

Because the PDP-11 provided memory protection via segmentation, it made sense to call a memory-access error when an address was outside of any valid segment—a segmentation fault or segmentation violation. This term has stuck around, even though modern systems don't use segmentation in the same way.

  • Dog speaking

    OMG! I had no idea that's where the term segfault came from.

  • Goat speaking

    Meh. What I've learned is that when Intel came out with the 80286 in 1982, the PDP-11 had been out for over a decade, doing segmentation the right way. Yet Intel went ahead and did it the wrong way. Classic Intel.

(When logged in, completion status appears here.)