CS 134

OS/161 File and Process System Calls Design

1. Introduction

This document describes the design and implementation of file and process system calls in OS/161. The goal is to support basic file operations and process management, including file descriptors, open/close/read/write operations, and process forking.

2. Key Components

2.1 OpenFile System

The OpenFile system manages open file entries, providing an abstraction layer between file descriptors and the underlying VFS (Virtual File System).

Key Features:

  • Tracking of file offset for each open file
  • Reference counting so that OpenFile objects can to be shared among multiple file descriptors
  • Access mode (read, write, or both) for the open file for error checking
  • Ensure that file operations execute atomically (e.g., if two threads try to write to the same file, one write happens first, then the other, not both trying to happen at once)
  • Abstraction of file operations (read, write, seek) so that higher-level code can work with files without knowing the details of the underlying VFS file calls

2.2 FileTable System

The FileTable system manages file descriptors for each process, providing a layer of indirection between process-specific file descriptors and OpenFile entries. This arrangement is necessary to support dup/dup2 and duplication of all file descriptors in a forked process.

Unlike the OpenFile system, which builds on the VFS layer but hides those VFS details, this layer builds on the OpenFile system but expects users will still make direct calls to the OpenFile system.

Key Features:

  • Provides mapping between file descriptors and OpenFile structures

2.3 PID System

(This part does not need to be done for part 1 of the assignment.)

The PID System is heavily based on the Week 5, Lesson 2 and provides support for waitpid, allowing code like this to work:

pid_t pid = fork();
if (pid == 0) {
    // child does something here
    _exit(0);
} else {
    // parent
    int status;
    waitpid(pid, &status, 0);
}

The PID system provides vital support for

  • fork by giving the child process a new PID
  • _exit by storing the exit status of the process and/or waking up any processes waiting on it
  • waitpid by retrieving the exit status of a child process and/or blocking until it exits

The design is very similar to the one in the lesson, which used fixed-size table for the PIDs with tagged indexing to give a broader range of PID values and avoid quick reuse of PIDs. A few tweaks are made to use pid_t instead of unsigned int for PIDs, and the addition of locking to make it thread-safe.

Whereas the code from the lesson stored a process name in struct procinfo, here we need to store whether the process is running and what its exit status is.

2.4 Process Structure Updates

The Process structure (struct proc) is updated to include a file table, allowing each process to maintain its own set of file descriptors and note the process's PID.

Key Changes:

  • Addition of the file table to the process structure
  • Implementation of process creation and destruction with proper file table management
  • PID assignment to each process (not implemented in part 1 of the assignment)

2.5 System Calls

New system calls are implemented to support file and process operations. Implementing a system call does not require any changes in user space as the support is already present in the OS/161 library. We just need to add a case in the switch statement in syscall in arch/mips/syscall/syscall.c to call a handler function for each system call. By convention, the handler functions are named sys_<name> and the system-call–number constants for the case statement are named SYS_<name>.

We need to implement the following file-related system calls: open, close, read, write, lseek, dup, dup2, __getcwd, and chdir, and the following process-related system calls: fork, _exit. In part two of the assignment, we also implement getpid, waitpid, and execv.

3. Design Details

Here is a more in-depth look at the design of the key components and how they interact with each other.

3.1 OpenFile System

The OpenFile structure encapsulates the state of an open file, including:

  • A lock for synchronization
  • A reference count to track how many file descriptors are referring to this open file
  • A vnode pointer for the underlying VFS open file
  • The current file offset
  • The access mode (read, write, or both)

Key operations:

  • openfile_open: Opens a file and creates an associated OpenFile structure on the kernel heap, returning a pointer to it (and an error code)
  • openfile_share: Increments the reference count for an OpenFile object
  • openfile_close: Decrements the reference count and closes the file if it reaches zero
  • openfile_read, openfile_write, openfile_seek: Perform the respective operations on the file

3.2 FileTable System

The FileTable system manages file descriptors for each process:

  • It is a fixed-size array with OPEN_MAX entries (the maximum number of open files per process)
  • Each entry in the file table points to an OpenFile structure

Key operations:

  • filetable_init: Initializes an empty file table
  • filetable_destroy: Closes all open files and frees resources
  • filetable_get: Retrieves the OpenFile associated with a file descriptor
  • filetable_set: Sets or updates a file descriptor's associated OpenFile, closing the previous file if necessary
  • filetable_nextfree: Finds the next available file descriptor
  • filetable_copy: Creates a copy of the file table for fork operations

Note, it is the caller's responsibility to have upped the reference count on the OpenFile if adding a duplicate file descriptor to the file table.

All of the functions except filetable_init and filetable_destroy can be assumed to operate on the current process (curproc) and so don't need to take a struct proc * as an argument. This approach helps to enforce the idea that because only the current process can access its file table (and processes are single threaded) we don't need a lock for the file table.

While thinking about the file table, it's good to remember that processes started from the kernel menu need to have their standard input, output, and error file descriptors set up for them. It's tempting to put this setup code in runprogram, but it turns out to be better to put the code in cmd_progthread in kern/main/menu.c, which is the function that calls runprogram for the kernel menu. See the system call section for more details.

3.3 PID System

As mentioned earlier, the PID system is based on the Week 5, Lesson 2, using a fixed-size table for PIDs with tagged indexing to avoid quick reuse of PIDs.

Key operations:

  • pid_bootstrap: Initializes the PID system
  • pid_alloc: Allocates a new PID for a process
  • pid_free: Frees a PID
  • pid_exit: Marks a process as exited and wakes up any processes waiting on it
  • pid_wait: Waits for a process to exit or retrieves its exit status if it has already exited

Note that while the code in the lesson provides pid_to_info and that function still exists and is used, it's now an internal function and not part of the public API. Likewise, struct procinfo is not part of the public API.

Locking and Waiting

The allocation and freeing functions hold a lock on the entire PID system to ensure that only one process can allocate or free a PID at a time.

When waitpid call finds that the process hasn't exited yet, it needs to wait. To support this, we could put a lock and condition varible in the process info structure. But that would be a lot of locks and condition variables. On the other extreme, we could use just one lock and condition variable for the entire PID system. But that would be a lot of contention. So, we adopt a middle ground: we have a much smaller array (only 13 elements) of locks and condition variables used for notifications. The notification entry to use is just pid % 13. This should be enough to avoid contention while still keeping the number of locks and condition variables reasonable. If two PIDs happen to share the notification entry, that's fine, it just means that someone will wake up spuriously, be disappointed and go back to sleep.

Supporting WNOHANG

Although supporting WNOHANG is not required, it's quite easy. Just don't wait if WNOHANG is set, instead return the error ECHILD if no children have exited yet.

Supporting -1 as the PID

In POSIX, passing -1 as the PID to waitpid means to wait for any child to exit. This feature is not required in OS/161. Supporting it is trickier than it might seem, as there are various edge cases to consider. Are you sure there won't be lost wakeups? How do we wait for any of multiple PIDs to exit? What happens if a child exits while we're scanning through the children? How do we know who the children are? That's a lot of complexity for a feature that's not required.

Nevertheless, it is supported in this implementation. If you want to know how it's done, talk to Prof. Melissa.

3.4 Process Structure Updates

The struct proc is updated to include a file table:

struct proc {
    // ... existing fields ...
    struct filetable p_filetable;
    // ... other fields ...
};

This allows each process to maintain its own set of file descriptors. The file table is initialized in proc_create and destroyed in proc_destroy. We could make it a pointer instead of directly embedding it in the struct proc, but it's not that big and it's simpler this way.

PID Assignment

(This part does not need to be done for part 1 of the assignment.)

The struct proc is updated to include a PID field:

struct proc {
    // ... existing fields ...
    pid_t p_pid;                    /* my pid */
    // ... other fields ...
};

The pid is set with pid_alloc when a process is created. Note that pids are not freed when the process is destroyed, they're freed when the process is waited on and found to be dead.

3.4 Process Lifecycle Management

A key addition to the process management system is the implementation of proc_exit, which handles the termination of a process. This function plays a crucial role in cleaning up resources and ensuring proper process shutdown.

proc_exit Implementation

The proc_exit function is responsible for:

  1. Handling the exit status of the terminating process (not implemented in part 1 of the assignment, a call to pid_exit in part 2).
  2. Ensuring that the kernel process (kproc) cannot exit.
  3. Detaching the current thread from the process.
  4. Destroying the process structure, which includes:
    • Closing all open file descriptors (via filetable_destroy)
    • Freeing the process's address space
    • Releasing any other resources held by the process
  5. Terminating the thread.

Note: The exit status of a process is value made using the macros in <kern/wait.h>, such as _MKWAIT_EXIT(exitcode) and _MKWAIT_SIG(signal).

Note: It's probably wise to think about atomicity/race conditions when writing this function. We probably don't want to switch threads in the middle of cleaning up a process.

kill_curthread Changes

The kill_curthread function is updated to call proc_exit the appropriate signal code when the current thread is killed.

Full support for _exit and waitpid

In part 2 of the assignment, we need to support waitpid. All the real work is done by the PID system, described above. We just need to call pid_exit in proc_exit to mark the process as exited, and pid_wait in sys_waitpid to wait for the process to exit.

Also, common_prog from the menu system is updated to call pid_wait so that it waits for the executed program to exit before returning to the menu and reports the exit status of the program.

Full support for execv

In part 2 of the assignment, we need to support execv. Here's how...

We add a global buffer to the kernel, of size ARG_MAX characters, protected by a lock so only one process can use it at a time. This buffer is used to store the arguments passed to execv by the user process.

When a process calls execv, we need to first extract argument information from the process whose address space we're about to destroy. This is done as follows:

  • Copy over the name of the program we're going to run (1st argument).
  • Grab the lock on the kernel buffer.
  • Copy the arguments from the user process into the kernel buffer (starting at the lowest address as usual). Note how many there were, and how much space they took up.

At this point, we no longer need any information from the process who called execv. But we can't destroy its address space yet because future steps might fail.

  • Create and initialize a new address space for the program, much like runprogram does. If that fails, we bail out. This includes:
    • Loading the executable into memory.
    • Allocating space for the stack (giving us a pointer to the top of the stack).
  • Destroy the old address space, activate the new one.
  • Adjust the stack pointer to allocate space for the arguments, remember this as argdest.
  • Align the stack pointer to a multiple of sizeof(uintptr_t) (e.g., with sp -= sp % 4).
  • Adjust the the stack pointer to have space for an array of pointers for all the arguments, plus a NULL pointer at the end. Remember this as argvdest.
  • Copy each the arguments from the buffer to the user stack, starting at argdest, using copyoutstr to write it.
    • Each time we copy an argument, we also need to add to the user's argv array that starts at argvdest, so we need to write argdest to argvdest + i * sizeof(char *) with copyout to write the address of the argument in the user space.
    • copyoutstr has an out argument at the end that provides the number of bytes copied, which we can use to adjust argdest for the next argument.
  • Add the final NULL pointer to the end of the array.
  • Release the lock on the buffer.
  • Call enter_new_process passing in the number of arguments, the address in userspace of the argv array, and the address of the top of the stack.

Much of this functionality in the second half already exists in runprogram, so it makes sense to adjust that code to be more modular and reusable, perhaps renaming it to sys_exec_internal to capture the fact that it is the kernel-data-only of the sys_execv system call. In other words, we the code is adjusted as follows:

  • Write sys_execv to copy the filename and arguments from userspace, and then call sys_exec_internal to do the rest.
  • Refactor runprogram to become sys_exec_internal to take not only the filename, but also whatever information about arguments copied to kernel space that sys_execv needs.
    • It now needs to deal with either there being no address space (brand new process, created from the menu system), or an old address space (called from sys_execv).
  • Change cmd_progthread to call sys_exec_internal instead of runprogram, and now pass the program filename twice, once as the executable to run, and once as the zeroth argument to the program.

Note that the lock on the kernel buffer is acquired in sys_execv and released in sys_exec_internal. Any other code, like the menu system, also needs to acquire the lock before calling sys_exec_internal. Any error path out of sys_exec_internal must release the lock.

3.5 System Call Implementation

For the most part, the whole point of writing the subsystems, OpenFile, FileTable, and PID, is to make the system call handlers as simple as possible. They should just be a few lines of code that tie together the subsystems to perform the desired operation.

Implmenting a system call comes in two parts, writing a handler function (described here) and adding a case to the switch statement in syscall in arch/mips/syscall/syscall.c, described in the next section.

Most system calls return two things, a possible error code (zero for no error) and some other value. In C, the usual way to return two values from a function is to add an out parameter for the second value as an additional argument. For example, int sys_read(int fd, userptr_t buf, size_t buflen, int *retval), would return any error as the return value and the number of bytes by writing into *retval.

The code for these belongs in kern/syscall/file_syscalls.c.

  • sys_open: Opens a file and returns a new file descriptor
    • Copy the filename from user space to kernel space
    • Call openfile_open to open the file and create an OpenFile structure
    • Call filetable_nextfree to find the next available file descriptor
    • Call filetable_set to associate the file descriptor with the OpenFile structure
  • sys_close: Closes a file descriptor
    • Call filetable_get to sanity-check the file is open
    • Call filetable_set to close the file by passing NULL as the OpenFile structure
  • sys_read: Reads from a file descriptor
    • Call filetable_get to retrieve the OpenFile structure and sanity-check the file is open
    • Call openfile_read to read from the file
  • sys_write: Writes to a file descriptor
    • Call filetable_get to retrieve the OpenFile structure and sanity-check the file is open
    • Call openfile_write to write to the file
  • sys_lseek: Changes the file offset
    • Call filetable_get to retrieve the OpenFile structure and sanity-check the file is open
    • Call openfile_seek to seek to the new offset
  • sys_dup2: Duplicates a file descriptor to a specific file descriptor number
    • Return if we're trying to duplicate a file descriptor to itself
    • Call filetable_get to retrieve the OpenFile structure and sanity-check the file is open
    • Call openfile_share to increment the reference count
    • Call filetable_set to associate the new file descriptor with the OpenFile structure
  • sys_dup: Duplicates a file descriptor (not required by the assignment, but super easy to implement)
    • Call filetable_nextfree to find the next available file descriptor
    • Reduce to a call to sys_dup2

The above system call handlers interact with the FileTable and OpenFile systems to perform their operations. In contrast, the two below are just simple shims stright to the VFS layer:

  • sys__getcwd
    • Set up a UIO based on the user pointer and length
    • Call vfs_getcwd to get the current working directory into that space
  • sys_chdir
    • Copy the filename from user space to kernel space
    • Call vfs_chdir to change the current working directory
Calling File Functions from the Menu System

In cmd_progthread (or runprogram) we need to set up the standard input, output, and error file descriptors for the new process. The sys_open system call exists to open files, but it is designed to be called from user space. So we split the function into two halves, the first part (corresponding to the first bullet above) copies the filename from user space to kernel space and then hands off all the remaining work to a second function, sys_open_internal, which does the actual work of opening the file. Because this function takes a kernel-space filename argument, it can be called from inside the kernel. In contrast, sys_dup and sys_dup2 don't read any data from user space, so they can be called directly from the menu system.

Specifically, the setup needed is:

  • Create a writable string buffer that contains filename for the console, "con:"
  • Call sys_open_internal to open the console for reading (if the file table is empty, it should be allocated file descriptor 0, which is what we want for standard input)
  • Restore the damage done to the string by getdevice()'s colon destroying antics
  • Call sys_open_internal to open the console for writing (should get file descriptor 1, as desired)
  • Duplicate the file descriptor for writing to file descriptor 2 (standard error)

The code for these belongs in kern/syscall/proc_syscalls.c.

  • sys_fork: Creates a new process as a copy of the current process (it's passed the trapframe)
    • Copies the address space
    • Creates a new process structure
    • Copies the file table
    • Copies the trapframe to the kernel heap
    • Creates a new thread that starts a helper function to run the child process, where the helper function
      • Copies the on-heap trapframe its own stack (as a struct trapframe variable)
      • Frees the on-heap trapframe
      • Calls enter_forked_process with the copied trapframe
  • sys__exit: Terminates the current process
    • Just calls proc_exit with the exit code

In part 2 of the assignment, we need to support the following:

  • sys_fork: As above, but now assigns a PID to the child process.
  • sys_exit: Exactly as above (but proc_exit now calls pid_exit).
  • sys_getpid: Returns the PID of the current process.
    • Just curproc->p_pid.
  • sys_waitpid: Waits for a specific process to exit.
    • Just calls pid_wait and then deals with copying the exit status back to the user process.
  • sys_execv: Replace the current process with a new program.
    • As described above.

3.6 Trap Handling and Program Execution

We add cases to the switch statement in the syscall in arch/mips/syscall/syscall.c to call the appropriate system call handler. Any register value that represents a pointer needs to be cast to a userptr_t before being passed to the system call handler. As noted in the written part of the homework (C.7), lseek is a bit annoying due to the 64-bit offset. The sample written answer provides good guidance on how to do this. The trap handling code (trap.c) is updated to properly handle abnormal process termination, calling proc_exit with the appropriate code for a signal.

The runprogram function is modified to set up standard input, output, and error file descriptors for new processes.

4. Synchronization and Concurrency

  • The OpenFile structure uses a lock to ensure thread-safe operations on individual files.
  • The FileTable does not require its own lock, as it is per-process and OS/161 processes are single-threaded.
  • The PID system uses a lock to ensure that only one process can allocate or free a PID at a time and uses a small array of locks and condition variables for notifications.
  • Process-wide operations (e.g., fork, exit) use existing synchronization mechanisms in the process structure.

5. Error Handling

Error codes are consistently used and propagated through the system call implementations. Common errors include:

  • EBADF for invalid file descriptors
  • EINVAL for invalid arguments
  • EMFILE when the process has too many open files
  • ENOMEM when memory allocation fails
  • ECHILD when no children have exited yet

6. Limitations and Future Work

  • For part 1 of the assignment
    • PIDs are not yet implemented, affecting the behavior of fork() and process management (fork just returns 1 in the parent and 0 in the child).
    • execv is not implemented
  • Signals are not implemented beyond those from kill_curthread.
  • While the implemented file operations closely follow POSIX semantics, full POSIX compliance would require additional system calls and more comprehensive error checking.

7. Conclusion

With these system calls in place, we can run significant programs. OS/161 goes from being able to do almost nothing from a user-program perspective to supporting the much of the core functionality of a POSIX system. That's pretty cool!

(When logged in, completion status appears here.)