Design and Implementation Requirements
The first week of this assignment is devoted to design, although you can write some ancillary code even during the first week. Your first goal is to work out how you are going to achieve the objectives of this assignment, so that both members of the team will have a clear idea of what it is you are trying to achieve when you begin coding. At the end of the design phase, you will bring your design documents to lab and be ready to explain your design to others (ideally, you should have electronic versions, too). After class you may use your own design as is, or borrow from someone else's to use in synthesizing your final design. (Remember to cite any borrowed elements.)
You will need to create a file handin/design.md that contains your design document. This document should be a (GitHub-flavored) Markdown file that describes your design for everything you will implement in this assignment. GitHub's Markdown flavor supports various features, including tables, code blocks, and links. There's a guide to basic GitHub-flavored Markdown, and one to more advanced features, such as diagrams using _Mermaid.
You are also required to provide an implementation plan and time budget in the file handin/plan.md. This requirement is intended to help you—if you are not disciplined in managing your time, you will have severe difficulties completing this assignment—you cannot expect to do all the work solely in lab time.
Phase 1: Basic File and Process System Calls
In this part, you will need to design and implement support for the following system calls:
open,read,write,lseek,close,dup2,chdir,__getcwdfork,_exit(preliminary implementation without PID support)
Your implementation of these system calls must
- Use the system-call numbers defined in
kern/include/kern/syscall.h - Conform to the user-level API given in
userland/include/unistd.h - Reflect the (POSIX-style) semantics given in the OS/161 documentation, especially with regard to error codes and return values
- Never crash the OS/161 kernel
The man pages in the OS/161 distribution contain a description of correct return values for various errors. If there are conditions that can happen that are not listed in the man page, return the most appropriate error code from kern/errno.h. If none seem particularly appropriate, consider adding a new one. If you're adding an error code for a condition for which Unix has a standard error-code symbol, use the same symbol if possible. Consult Unix man pages to learn about Unix error codes (see the intro page in Section 2 of the manual; use man man to find out how to get to Section 2). Note that if you add an error code to kern/include/kern/errno.h, you will need to add a corresponding error message to the file src/lib/libc/strerror.c.
I/O-Related Calls
The OS/161 kernel already knows how to open files and character devices, and read and write from them. The problem is that this functionality is not exposed to user programs—no system calls are currently implemented that provide these facilities. Your task is to implement the necessary POSIX-style interface.
The calls in part C1 all manipulate file descriptors. From a user-code perspective, file descriptors are small integers returned by the operating system that signify open files. On a POSIX system, the convention is that for any given process, the first three file descriptors (0, 1, and 2) are considered to be standard input (stdin), standard output (stdout), and standard error (stderr). In your solution, programs run directly from the OS/161 menu system should start out with these file descriptors attached to the console device (con:), but your implementation must allow programs to use dup2 to change them to point elsewhere.
Although these system calls may seem to be tied to the filesystem, in fact, they're really about manipulation of file descriptors or process-specific filesystem state. A large part of this assignment is designing and implementing a system to track this state. You must determine what information is specific only to the process, what is specific to the file descriptor, and how the two relate. Don't rush your design. Think carefully about the state you need to maintain, how to organize it, and when and how it has to change. Important questions include
- What data needs to be protected by locks?
- What happens to file descriptors when a process
forks, and what about afterwards? - Will
dup2pose any complications for your design? - Are you really sure what
dup2does?
Remember that you will not actually have to write any low-level file system code to implement these system calls. You will use the existing VFS layer to do most of the work. Your job is to construct the subsystem that implements the interface expected by user-level programs by invoking the appropriate VFS and vnode operations.
For consistency, we will expect you to put most of your implementation of file-related system calls in the following files:
kern/proc/filetable.candkern/include/filetable.hfor managing file descriptorskern/proc/openfile.candkern/include/openfile.hfor managing open-file objectskern/syscall/file_syscalls.cfor the system-call handlers for file-related system calls
This structure may give you some strong hints about what might be a good design. The corresponding header files will also need to be updated to reflect the new system calls you are implementing.
Process-Related Calls
In this part of the assignment, you should implement a simplified form of the fork system call. Your implementation should be the same as that described in the fork man page, except that it should return 1 to the parent (rather than the child's PID).
For consistency, we will expect you to put most of your implementation of process-related system calls in the following files:
kern/proc/proc.candkern/include/proc.hfor managing processes themselveskern/syscall/proc_syscalls.cfor the system-call handlers for process-related system calls
The amount of code to implement fork and _exit is quite small at this stage; the main challenge is to understand what needs to be done and where. In particular, you should design and implement the file-related system calls with fork in mind.
Some hints:
- Read the comments in kern/include/addrspace.h; particularly as_copy
- In many ways, the solution to fork is setting up the right arguments to thread_fork, including
- A way to end up calling enter_forked_process with the right arguments
- An appropriate struct proc to pass for the new child
- The file-descriptor table also needs to be duplicated. This task isn't especially hard, but it is nevertheless possible to do it wrong
Robustness
Currently, if user-space code generates a fatal exception, the kernel panics. This placeholder solution is obviously not acceptable, and needs to be fixed now that you'll be running actual programs in user space. You'll need to replace kill_curthread with a new version that properly handles such issues more sanely. You should, of course, try to write it in as simple a manner as possible. Keep in mind that essentially nothing about the current thread's user-space state can be trusted if it has suffered a fatal exception—it must be taken off the processor in as judicious a manner as possible, but without returning execution to the user level.
Phase 2: Advanced Process-Related System Calls
In the final part of the assignment, you will implement the following system calls:
getpidexecv,waitpid, and_exit
Implementing Process IDs
A PID, or process ID, is a unique number that identifies a process (as such, it has much in common with a file descriptor; both are integers that let user-space programs refer to kernel data safely). The implementation of getpid is not terribly challenging, but PID allocation and reclamation are the important concepts that you must implement. It is not okay for your system to crash because over the lifetime of its execution you've used every possible PID once. Design your PID system, implement all the tasks associated with PID maintenance, and only then implement getpid.
The file kern/proc/pid.c is provided as a starting point for your PID system. You may use this file as is, modify it, or replace it with your own implementation.
Process Creation, Execution, and Termination
In this part, you must revise fork so that a newly created process is given a new PID and that PID is returned to the parent of the process (whereas the child should return 0).
There is sample code for a simple PID subsystem in Week 5, Lesson 2. You may adapt this code if you like, provided you give proper attribution. You may also choose to implement a different PID system. An important question to answer is what information you need to store in the process table to properly support _exit and waitpid. You should also consider how to handle the case where a parent process exits before its child process.
Although it may seem simple at first, waitpid requires a fair bit of thought. Read the specification carefully to understand the semantics, and consider these semantics from the ground up in your design. You may also wish to consult the Unix man page, although you should keep in mind that you are not required to implement all the things that Unix's waitpid supports—nor is the Unix parent/child model of waiting the only valid or viable possibility.
Your revised implementation of _exit is also intimately connected to your implementation of waitpid. They are essentially two halves of the same mechanism. In all likelihood, your code for _exit will be simple and your code for waitpid more complex—but it's perfectly viable to design it the other way around. If you find that both functions are becoming extremely complicated, it may be a sign that you should rethink your design.
Finally, although execv is “only” a system call, it is perhaps the most challenging part of this assignment. It is responsible for taking a process and making it execute a new program. Essentially, it must replace the existing address space with a brand new one for the new executable (created by calling as_create in the current dumbvm system) and then run it. While this sequence is similar to starting a process straight out of the kernel (as runprogram does), it's not quite that simple. Remember that this call is coming out of user space, into the kernel, and then returning back to user space. You must manage the memory that travels across these boundaries very carefully. (Also, notice that runprogram doesn't take an argument vector—but arguments must, of course, be handled correctly in execv).
For debugging your argument-copying code, you may find the syscall simulator in Week 6, Lesson 1 useful. You can use it to simulate a portion of your execv implementation without having to run the full OS/161 kernel.
Hints & Tips
You need to think about a variety of issues associated with implementing system calls. In this section, we raise some questions and offer a few tips.
- Can two different user-level processes find themselves running a system call at the same time? (Whatever conclusion you reach, your conclusions in matters such as these belong in your design document.)
-
When you undertake the second part of this assignment, your system must allow user programs to receive arguments from the command line. For example, your shell should be capable of executing lines (in user programs) such as
char *filename = "/bin/cp"; char *args[4]; pid_t pid; args[0] = "cp"; args[1] = "file1"; args[2] = "file2"; args[3] = NULL; pid = fork(); if (pid == 0) execv(filename, argv);That code will load the executable file
/bin/cp, install it as a new process, and then execute it. The new process will then findfile1on the disk and copy it tofile2.
Some questions to think about:
-
Passing arguments from one user program, through the kernel, and into another user program is a bit of a chore. It is also rather tricky, and there are many ways to be led astray. You will probably find that very detailed pictures and several walkthroughs will be very helpful.
-
What primitive operations exist to support the transfer of data to and from kernel space? Do you want to implement more on top of these?
-
How will you determine
- The stack pointer's initial value?
- The initial register contents?
- The return value?
- Whether you can
execthe program at all?
-
You will need to bulletproof the OS/161 kernel from user-program errors. There should be nothing that a user program can do to crash the operating system (with the exception of explicitly asking the system to halt).
-
What new data structures will you need to manage multiple processes?
-
What relationships do these new structures have with the rest of the system?
-
How will you manage file accesses? When your shell invokes the
catcommand, and thecatcommand starts to read from file descriptor 0, what will happen if the shell also tries to read from the same file descriptor? What would you like to happen?
Finally, remember to keep it simple. Both open and fork are allowed to return errors saying that you've opened too many files at once or forked too many processes at once, which, in turn, means that you are allowed to use fixed-sized tables.
(When logged in, completion status appears here.)