CS 134

Note: This design was reformatted as the original student-submitted Markdown did not render well.

Design plan

The following is our design plan for implementing basic file and process system calls, as well as advanced process-related system calls.

Basic File and Process System Calls

Whenever not otherwise noted, presume each call sets the appropriate errno before returning unsuccessfully.

General Principles

For I/O-related calls there will be some design decisions common across all calls. For example, data will be lock protected any time the file is being accessed by a process.

Open files are represented by the open_file object. This stores a pointer to a vnode, an offset, access mode, and additional infrastructure to permit safe concurrent access, which we will cover later on. open_file implements create, destroy TODO

File desciptors are process-specific, so the same (valued) file desciptor across different processes may point to different open_file objects, and different file descriptor (values) may point to the same open_file object. As such, we require an additional layer of indirection inside each process to translate file descriptors into open_file objects. This is p_filetable, which is a statically allocated array within each process of pointers to open_file objects, indexed by file descriptor.

Again, this enables shared access to open_file, which contains a mutable offset, access to which must be protected via a lock. open_file must be allocated on the heap allocated because it must outlive the operations performed upon it (save for destroy). As such, we must avoid a dangling pointer by delaying deallocation until there are no file descriptors which point to the object. To satisfy this constraint, we add a reference count, destroying the object only after there exist no outstanding references.

We've alluded to two situations in which we can have shared references to open_file objects, and here are the two function calls from which they respectively arise:

  • dup(): When dup is called, the kernel returns a file descriptor of different value within the same process which points to the same open_file.
  • fork(): When forked, the filetable is cloned along with the rest of process state. This results in file descriptors of the same values between parent and child processes with each same-value file descriptor pointing to the same open_file. Mutations performed by reference to open_file objects will be observed by all processes holding the same reference, however modifications within the process to the filetable itself will not be shared.

open()

open() opens/creates a file at a given path under a specified access mode, returning a file descriptor, or -1 in the case of an error.

At a high level, open() achieves this through the following operations:

  • creates and obtains a reference to a new open_file object at offset 0 in the stated access mode
  • assigns the reference to the first open entry (null pointer) in the process' filetable, located via linear search
  • returns to the user the index of the reference to the open_file within the process' filetable, which is the file descriptor

read()

read() attempts to read up to a specified maximum number of bytes from a open file of read-enabled access mode into a specified memory address. The open file is referenced by file descriptor. It returns the number of bytes read, where a returned 0 indicates that the end of the file has been reached. -1 is returned in the case of an error.

At a high level, read() achieves this through the following operations:

  • translates the file descriptor to an open_file object via the process' filetable
  • creates a uio for data transfer of the desired maximum read size at the specified offset
  • under protection of the open_file lock:
    • calls VOP_READ to read from the vnode into the uio according to the parameters previously set in uio
    • the offset of the open_file is advanced by the number of bytes read
  • copies the relevant contents of the uio to the specified memory address (in user space)
  • returns the number of bytes read

write()

write() attempts to write up to a specified maximum number of bytes from a open file of write-enabled access mode into a specified memory address. The open file is referenced by file descriptor. It returns the number of bytes written. -1 is returned in the case of an error.

At a high level, write() achieves this through the following operations:

  • translates the file descriptor to an open_file object via the process' filetable
  • creates a uio for data transfer of the desired write length
  • copies the specified number of bytes starting from the specified memory address (in user space) into the uio
  • under protection of the open_file lock:
    • call VOP_WRITE to write from the uio into the nvode according to the parameters previously set in uio
    • the offset of the open_file is advanced by the number of bytes written
  • returns the number of bytes written

lseek()

lseek(), according to parameters whence and pos, repositions the offset of a open_file pointed to by a provided file descriptor, returning the new position. -1 is returned in the case of error. Such cases include but are not limited to attempting to seek objects which are not seekable, e.g. certain devices represented by the OS as files.

At a high level, lseek() achieves this through the following operations:

  • translates the file descriptor to an open_file object via the process' filetable
  • Check to make sure the file is seekable through IS_SEEKABLE
  • under protection of the open_file lock update its offset according to whence:
    • SEEK_SET: set the offset to the input byte offset
    • SEEK_CUR: set the offset to the old position plus the input offset
    • SEEK_END: set the offset to the size of the file plus the input offset (specified as a negative int)

close()

close() closes a provided file descriptor, returning 0 if successful and -1 on error, for instance if the corresponding file descriptor does not point to any open_file.

At a high level, close() achieves this through the following operations:

  • translates the file descriptor to an open_file object via the process' filetable
  • under protection of the open_file lock:
    • decrements the reference count
    • destroying the open_file object if the resultant value is 0
  • sets the element of the process' filetable corresponding to the provided file descriptor to null.

dup2()

dup2() replicates an existing file descriptor oldfd to a new file descriptor newfd, taking care to properly close newfd if it is already assigned before reassignment. Returns newfd on success, -1 on error.

At a high level, dup2() achieves this through the following operations:

  • if oldfd and newfd are the same, immediately returns with newfd, otherwise proceeds with the remaining steps
  • in the open_file from the filetable entry corresponding to oldfd, under protection of its lock
    • increment its reference count to prevent the reference count of open_file from dropping to zero by an interleaved thread, which would result in deallocation while we still hold a pointer, now dangling.
  • if newfd corresponds to an existing open_file in the process' filetable, closes oldfd according to the procedure in close() (potentially via a kernel function encapsulating the apposite behavior).
  • sets newfd to point to the oldfd open_file pointer. We have already incremented the reference count to account for the additional reference.

chdir()

chdir() changes the current working directory of the process, returning 0 on success, and -1 on error.

It achieves this by calling vfs_chdir with the provided path to switch the process' current working directory

__getcwd()

__getcwd() gets the current working directory of the process, returning a pointer to a string with the pathname on success, and NULL on faliure.

It achieves this by creating a temporary uio and calling vfs_getcwd with the buffer. It then copies the string stored in the buffer to a user space location, and returns the location of the copied string.

In this section, our design covers only a preliminary implementation that does not feature PID support.

Our process struct includes:

  • a name
  • a spinlock (to protect addrspace)
  • the addrspace of the process
  • a number of threads
  • a filetable (fixed-size array of pointers to open_file objects)
  • the current working directory as a pointer to a vnode
  • state: {SLEEP, READY, WAIT, EXIT}
  • processor context (register state, probably as a struct)

fork()

fork() creates a new process that is a duplicate of the calling process, called the child process. In this it will duplicate the address space, file descriptors (creating a new table), and current working directory. Upon success it will return 1, otherwise it will return -1 and set errno. As with UNIX, we clone only the calling thread.

It acheives this through the following operations:

  • Creates a new struct proc with a new spinlock and name, and uses as_copy to create a new address space. Likewise it uses __getcwd() to get the directory for a current working directory vnode.
  • Duplicates the fdTable (also stored in struct proc) by calling dup2() on every file descriptor in the calling thread.
  • Copies the parent process' trapframe (which will have a copy made on the kernel stack for enter_forked_process)
  • Calls enter_forked_process(), thus passing through the trapframe and starting the new process in usermode.

_exit

_exit() terminates the calling process.

It acheives this through the following operations:

  • sets process state to EXIT
  • cleans up all file descriptors (to prevent leaking open_file objects)
  • destroys the addrspace

Kernel functions

kill_curthread will be reimplemented to set the appropriate signal and schedule another thread to avoid having to panic.

This requires an implementation of a process table. This will be implemented through tagged indexes as described in week 5 lesson 2. This will include functionality to allocate, free, and get information about a pid. Likewise fork() will be reimplemented, where it gets a pid from a shared pid process table and assigns it as the name and return value of the forked process. Each process also gets a pointer to its entry in the process table, from which it can obtain proc_info.

getpid()

getpid() gets the pid of the calling process. It does so by accessing the pointer the process stores to its entry in the process table, returning the pid corresponding to to calling process.

execv()

execv takes a process and replaces it with a new user program. It will do so by creating a new address space, running it, and then returning to the user space. It will do so in the following steps:

  • loads from the file specified by the first argument into a vnode
  • copies the arguments into an packed argument buffer specified with a const maxlen, protected under a lock to prevent multiple access
  • verifies that the file is a valid excecutable by checking for the ELF magic numbers
  • creates a new address space with as_create, and populates it from the arguments using as_prepare_load, load_elf, and as_complete_load
  • switches to and activates the address space using proc_setas() and as_activate()
  • runs the program excecutable, storing the result in a uio buffer, which is then copied to an address in userspace with userptr_t

waitpid()

waitpid() waits for a process to change state -- particularly for our application wait for the children of a process to terminate so it can be reaped. On success it returns the pid of the process that has changed, otherwise if WNOHANG is specified then if any children exist but have not changed state it returns 0. On faliure it returns -1. If WUNTRACED is true then also return if a chiled has stopped.

It is important to note that if a child process has already terminated then waitpid will return immediately.

It acheives this by calling wait(), blocking the process until it recieves an interrupt (where WNOHANG is not specified). When a child calls exit() it sends an interrupt that wakes the process, causing the process to then reap the child process and return its pid. If WUNTRACED is set then the parent will respond to an interrupt from a stopped child thread.

exit_()

_exit() terminates the calling process.

It acheives this through the following operations:

  • sets process state to EXIT
  • cleans up all file descriptors (to prevent leaking open_file objects)
  • destroys the addrspace through the function provided by its type
  • calls thread_exit to destroy the calling thread
  • all the above is the same as prior _exit, but with pid support we must also clean up the corresponding entry in the process table, wake up the parent
  • in the event a parent process terminates before a child process, the child is assigned PID 1, where waitpid can then be called to reap it.

(When logged in, completion status appears here.)