System Call Simulation
I'm a bit anxious about Homework 4. I'm not sure I really understand how system calls work.
Okay. Let's build a system-call simulator that will help.
The User Side
Let's start with the user side of things. We'll write a program that uses a few system calls, but rather than using the standard system-call wrappers (from the C library), we'll use a generic function, system_call, that can make any system call provided by our simulated kernel. (We'll also include a uprintf function (that's similar to printf), so we can print debugging messages and see what's happening.)
#include "user.h"
int
main(void)
{
long ret;
int exit_code = 0;
uprintf("User process started up.\n");
ret = system_call(SYS_PRINT, (intptr_t) "Hello, Kernel!\n", 0, 0, 0);
uprintf("SYS_PRINT returned: %ld\n", ret);
ret = system_call(SYS_PRINT, 0, 0, 0, 0);
uprintf("Errorneous SYS_PRINT returned: %ld (%s)\n", ret,
strerror(errno));
ret = system_call(SYS_GET_PID, 0, 0, 0, 0);
uprintf("SYS_GET_PID returned: %ld\n", ret);
uprintf("Calling SYS_EXIT\n");
system_call(SYS_EXIT, exit_code, 0, 0, 0);
uprintf("This should not be printed\n");
return 0; /* This should never be reached */
}
Our system_call function takes a system-call number and up to four arguments. It returns the return value of the system call (which may also set the global errno variable). All the arguments are intptr_t values, which are integers that are the same size as pointers on the system, allowing us to cast pointers to integers and back again.
So
system_callis a lot like thesyscallinstruction we saw in OS/161?
Wait, so why did you write your own?
Because we don't want to call the actual kernel—we want to simulate system calls.
The Kernel Side
Here's code on the kernel side of the divide that implements the system calls.
#include "kernel.h"
#define SIMULATED_PID 42
#define SYS_PRINT_LIMIT 4096 /* Maximum number of characters to print */
int
sys_print(const_userptr_t str, int *len_printed)
{
char kernel_buf[SYS_PRINT_LIMIT];
int err;
err = copyinstr(str, kernel_buf, sizeof(kernel_buf), NULL);
if (err == 0) {
fputs(kernel_buf, stdout);
*len_printed = strlen(kernel_buf);
return 0;
} else {
return err;
}
}
int
sys_get_pid(int *pid)
{
*pid = SIMULATED_PID;
return 0;
}
int
sys_exit(int code)
{
kprintf("User process exited with code %d\n", code);
/* Terminate the user process */
kill(user_pid, SIGTERM);
return 0;
}
void
handle_syscall(struct syscall_args *args, struct syscall_result *result)
{
int err, retval;
kprintf("Handling syscall %d\n", args->num);
switch (args->num) {
case SYS_PRINT:
err =
sys_print((const_userptr_t) args->args[0], (int *) &retval);
break;
case SYS_GET_PID:
err = sys_get_pid((int *) &retval);
break;
case SYS_EXIT:
err = sys_exit(args->args[0]);
user_connected = false;
return;
default:
kprintf("Unknown syscall number: %d\n", args->num);
err = ENOSYS;
retval = -1;
}
result->err = err;
result->ret_val = err ? -1 : retval;
kprintf("Syscall %d returning %d, err=%d\n", args->num, retval, err);
}
Here, we've written a helper function for each system call, and a handle_syscall function that dispatches to the appropriate helper function based on the system-call number. The handle_syscall function also sets the return value and error code in the syscall_result structure the kernel will return to the user process.
As in the user code, the kernel needs to use a cast to convert the intptr_t arguments to the appropriate types.
Accessing User Memory
User memory is in a separate address space from the kernel, so the kernel can't directly access it. Instead, some kind of mechanism is needed to copy data between user and kernel memory. In OS/161, this is done with the copyin and copyout functions, which are defined as,
int copyin(const_userptr_t usersrc, void *dest, size_t len);
int copyout(const void *src, userptr_t userdest, size_t len);
int copyinstr(const_userptr_t usersrc, char *dest, size_t len, size_t *got);
int copyoutstr(const char *src, userptr_t userdest, size_t len, size_t *got);
where
copyincopieslenbytes from the user-space addressusersrcto the kernel-space addressdest.copyoutcopieslenbytes from the kernel-space addresssrcto the user-space addressuserdest.copyinstris likecopyin, but it copies a null-terminated string; it writes the total number of bytes copied togot.copyoutstris likecopyout, but it copies a null-terminated string; it writes the total number of bytes copied togot.
All of these functions return an error code, with zero indicating success. Possible errors are
EFAULTif the memory access is invalid.ENAMETOOLONGif the string is too long.
So, the kernel can't directly access user memory, but it can copy data to and from user memory using
copyinandcopyout? How do those functions work?
Let's set that aside for now. It doesn't matter exactly how they work, only that in our simulator, literally the only way the kernel can access user memory is through these functions. And the same is true for OS/161 and most other operating systems.
Can you remind me why it's so important that we have this big wall between user and kernel memory? I never like fences.
The wall is there to protect the kernel from user processes. If a user process can write to kernel memory, it can overwrite the kernel and crash the system. If a user process can read kernel memory, it can read sensitive information like passwords or encryption keys. So the wall is there to keep the kernel safe.
But what about the other direction? If the kernel is all powerful, why can't it just read user memory directly?
Sometimes it's actually technically challenging to do a direct read (as in our simulator), but the main reason is that it makes a lot of sense for the kernel to be very cautious about reading user memory, as we'll discuss in a moment.
Is
ENAMETOOLONGa good name for not having enough space to copy the string? It's not just returned for long names, but for any string that's too long.
Alas, the POSIX error codes are sometimes a bit overly specific. There is also
ENOSPC, which is for when there's not enough space on a device like a disk;ENOMEMfor when the system has run out of memory; andE2BIGfor when the space required for program arguments are too big, but there is no generic “not big enough” error code. sigh
Error Handling
It's vitally important that the kernel never crash, even if the user provides bad input. One very nice thing about the memory-copy functions is that they will return an error code if the memory access is invalid, rather than crashing the kernel. Catching the problem allows the kernel to return an error code to the user process, which can then take appropriate action.
But it also means that in kernel code, we have to be fastidious about both checking the return values of functions that might return an error code, and taking appropriate action if an error occurs.
Putting It All Together
To make our user-space / kernel-space simulator work, we need some support code. For the most part, you can ignore the details, but how it's actually implemented is
- A Unix process for the “kernel”, which handles the system calls.
- A Unix process for the “user process”, which makes the system calls.
- A communications channel between the two processes, which is used to pass system-call numbers and arguments from the user process to the kernel process, and to pass return values and error codes back.
- The communications protocol also handles requests from the kernel to read and write user memory.
Can you just let us play with the simulator?
Absolutely, that's the whole point.
Meh. Took you long enough to get there.
Alternatively, you can download the code and run it on your own machine. You can compile it with cc -g -Wall -o kern-sim *.c and run it with ./kern-sim.
In either case, the two most important files to look at are user.c and kernel.c.
Run it and take a look at the output. If you look at the code, you'll see we added a little bit of extra code to the user program compared to the code we showed above.
When you run the code, you should see output like
- kern: Kernel-space booting.
- user: User process started up.
- kern: Handling syscall 1
Hello, Kernel!
- kern: Syscall 1 returning 15, err=0
- user: SYS_PRINT returned: 15
- kern: Handling syscall 1
ABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOPQRSTUVWXYZABCDEFGHIJKLMNOP
- kern: Syscall 1 returning 1499, err=0
- user: SYS_PRINT returned: 1499 (Success)
- kern: Handling syscall 1
- kern: Syscall 1 returning 1499, err=14
- user: Errorneous SYS_PRINT returned: -1 (Bad address)
- kern: Handling syscall 2
- kern: Syscall 2 returning 42, err=0
- user: SYS_GET_PID returned: 42
- user: Calling SYS_EXIT
- kern: Handling syscall 3
- kern: User process exited with code 0
- kern: Kernel-space shutting down.
Going Further, Part 1
Let's fork this code, and add another system call, SYS_GET_TIME. You have two choices for how to implement this:
- Return the time as the syscall's return value, much like
SYS_GET_PIDreturns a fixed value. - Take a pointer to a
longvalue in user memory, and return the time in that location. This option will give you practice withcopyout.
You can use the time function from the standard C library to get the current time, and then return it to the user process.
You'll need to
-
Edit
common.hto add a new system-call number. Find the line that saysenum syscall_num { SYS_PRINT = 1, SYS_GET_PID, SYS_EXIT };and add a new entry for
SYS_GET_TIMEto the end of the list. -
Edit
kernel.cto add a new functionsys_get_timethat takes auserptr_targument for a place in user-space memory that points to alongvalue, and returns the current time in seconds since the Unix epoch. The function should return 0 on success, and an error code on failure.- You'll need to
#include <time.h>to use thetimefunction.
- You'll need to
-
Edit
kernel.cto add a case forSYS_GET_TIMEto thehandle_syscallfunction, which callssys_get_timeand sets the return value and error code in thesyscall_resultstructure. -
Edit
user.cto add a call toSYS_GET_TIMEand print the result withuprintf(use the%ldformat specifier to print alongvalue).
Try running the code with your new system call. You should see the current time in seconds since the Unix epoch printed out, looking something like this:
- kern: Handling syscall 4
- kern: Syscall 4 returning 0, err=0
- user: SYS_GET_TIME returned: 0, time: 1727739692
Except that instead of 1727739692, it'll probably be closer to 1727739692, since that's the actual number of seconds since the Unix epoch right now.
What is the Unix epoch?
The Unix epoch is the time that Unix systems use as the reference point for time: 00:00:00 UTC on January 1, 1970. The
timefunction returns the number of seconds that have passed since that instant, and it's the basis for all time calculations on Unix systems.
The change to kernel.c might look something like
int
sys_get_time(userptr_t timeptr)
{
long now = time(NULL);
return copyout(&now, timeptr, sizeof(now));
}
and adding this case to handle_syscall
case SYS_GET_TIME:
err = sys_get_time((userptr_t) args->args[0]);
retval = 0;
break;
The change to user.c might look something like
long now;
ret = system_call(SYS_GET_TIME, (intptr_t) &now, 0, 0, 0);
uprintf("SYS_GET_TIME returned: %ld, time: %ld\n", ret, now);
If you want to see the full working code, you can download the completed code.
Going Further, Part 2
If you want to go even further, you could implement a SYS_INPUT system call that reads a line of input from the user and returns it to the user process. You could use the getline function from the standard C library to read a line of input from the user, which allows arbitrary-length inputs. Be sure to free the memory that getline allocates after you're done with copying the string to user memory.
Implemented, a run might look something like
- kern: Handling syscall 1
Enter your name:
- kern: Syscall 1 returning 17, err=0
- user: SYS_PRINT returned: 17
- kern: Handling syscall 5
Melissa <-- User types this and presses Enter
- kern: Syscall 5 returning 8, err=0
- user: SYS_INPUT returned: 8, 'Melissa'
Give it a try!
Meh. Gotta save some time to do the rest of the lesson.
Actually that's it for today. We've deliberately left space in the lesson so you can have time to noodle around getting a better feel for system calls.
Woo-hoo, free time!
Well, the hope with giving you time to explore is that you'll use it. But if you don't, that's okay, too.
What if we're in the zone and want to keep going?
If the next lesson isn't out yet, you can always read over Homework 4, which is all about system calls.
You can check out the completed code if you want to see a working implementation.
Deeper Dive: How the Simulator Works
For those curious about the inner workings of our simulator, let's take a peek under the hood!
Ooh, I love learning MORE about how things work behind the scenes!
But wait, didn't we just learn about the user and kernel spaces? Where does the simulator fit in?
Good question. Our simulator is actually running on top of a real operating system, simulating both the user and kernel spaces within regular processes.
As we mentioned earlier, our simulator uses two separate Unix processes:
- A “kernel” process that handles system calls
- A ”user” process that makes system calls
These two processes communicate using a Unix domain socket pair, which allows bidirectional communication between them.
Here's a simplified flow of how a system call works in our simulator:
- In
user.c, user process callssystem_call(). - Inside
system_call(), which is inuser_internal.c, it sends a message to the kernel process via the socket. - The kernel process has a loop in
syscall_responder()(inkernel_internal.c) that receives the message and callshandle_syscall()to process the system call. - The
handle_syscall()function calls the appropriate system-call handler function, likesys_print()orsys_get_pid(), and then returns the result. - Then
syscall_responder()sends back the result to the user process, where it is received by thesystem_call()code in the user process. - In the user process,
system_call()returns the result and resumes the user code. The kernel process waits for the next system call message.
But how does the simulator handle memory copying between user and kernel space? Real operating systems can't directly access user memory, right?
Exactly right! Our simulator needs some tricks to handle this memory copying. Let's dive into the details.
In our simulator, the copyin, copyout, and copyinstr functions are also implemented in kernel_internal.c. Here's how they work:
- The user-side code in
system_call()is a bit more complex than we implied earlier. After it sends the system call number and arguments to the kernel, it waits, accepting various kinds of messages from the kernel. One of the messages is to say that the system call is done, and the result is ready to be read, but there are others as well. - When the kernel needs to access “user” memory, it sends a memory-read request to the user process. There are three different kinds of memory requests:
KREQ_READ_MEM,KREQ_WRITE_MEM, andKREQ_READ_STRING, corresponding tocopyin,copyout, andcopyinstr. - When accessing memory on behalf of the kernel, the user process sets up special signal handlers for
SIGSEGVandSIGBUSto catch memory-access errors. - If accessible, the user process copies the data and sends it back to the kernel process.
- If not accessible, the user process returns an error (
EFAULT).
This approach lets us simulate the protection boundary between user and kernel space, even though both are running as regular processes.
I looked at the code and it's even more complicated! There's
copyin_prim,copyout_prim, andcopyinstr_prim. What's the deal?
Our protocol limits the size of messages that can be sent between the user and kernel processes. If the data to be copied is too large, we need to break it up into smaller chunks. The
copyin_prim,copyout_prim, andcopyinstr_primfunctions max out at 1024 bytes (MAX_DATA_SIZE). The realcopyin,copyout, andcopyinstrfunctions call these primitives in a loop to handle larger data sizes.
Meh. Haven't you just massively overcomplicated things? Couldn't we just use regular function calls?
While that would be simpler, it wouldn't accurately represent how real operating systems work. The point of the simulation is to help us understand the complexities and constraints of real system calls.
Does this mean our simulator is slower than real system calls?
Yes, it is slower because of the interprocess communication. But for educational purposes, this trade-off allows us to clearly demonstrate the concepts without needing to modify a real operating system!
And it also demonstrates a fundamental truth. System calls are expensive compared to regular function calls, because they involve a lot of work to cross the user/kernel boundary. It's more work in the simulator than in a real OS, but system calls always require more work than a regular function call.
By using this approach, our simulator provides a realistic approximation of how system calls and memory protection work in a real operating system, all while running as a user-space program on top of an existing OS. Pretty neat!
(When logged in, completion status appears here.)