CS137 Homework 2: FAT, part 1

Overview

The purpose of this assignment is to begin developing a real filesystem. Because of the complexity of writing a working filesystem, the assignment is divided into two parts. In this first part, you will develop a lot of scaffolding and enough code for your filesystem to do something testable. In the second part, you will complete the filesystem.

You may write in any language that is supported on Wilkes and that supports Fuse, but as mentioned in class, I strongly recommend C or C++. Your code will be tested on Wilkes and must compile and run there.

The Assignment

Your assignment is to develop a FAT-like filesystem that supports the following features:

The general structure of the filesystem is similar to the Microsoft FAT design (see "FAT Design") below.
The filesystem supports the following operations at a minimum: getattr, access, readdir, and mkdir. (Note that at this point it is not necessary to support file I/O, or even files.)
Your mkdir operation must allocate space from the free list.
The filesystem is backed by a SINGLE preallocated 10-MB file with a fixed name, such as "fat_disk". The size of the file should be a #defined constant, of course. Remember to watch out for the working-directory gotcha.
When the filesystem is invoked, if the backing file doesn't exist, it is created and initialized. However, if it does exist, it should be attached and its previous contents should be visible.
When a mutating operation occurs, its effects must be immediately visible in the backing file. (This means that you can't do everything in memory and then wait until exit time to write things out. I will test this feature by killing your process with SIGKILL. O_DSYNC isn't necessary, because the operating system will make sure your data gets to stable storage unless the entire OS crashes—which isn't part of the testing plan!)
Subdirectories must be supported.
Your directories may be fixed-size; it is not necessary to be able to create an arbitrary number of entries in a directory.
Directory entries may also be fixed-size, as long as the name length is moderately reasonable. (Nothing under 16 characters is "reasonable" in my book; my minimal implementation compromises with a limit of 32.)
If you choose, file sizes may be limited to either 2³² or 2⁶⁴ bytes.
The acid test of your filesystem should be that is possible to create directories, list them (with ls -la returning reasonable results including "." and ".."), and cd into them.
Other operations are up to you. We will be extending the filesystem to support files, rmdir, etc. in the next assignment, so you are welcome to implement those things. However, they will not be tested in the current assignment.

Why this particular set of features? It's the minimum necessary to have a filesystem where you can do something visible: create and list directories. You'll find that you need to create quite a bit of scaffolding to get that far (in particular, the code that creates an initialized FAT filesystem from scratch).

FAT Design

When I refer to a "FAT-like" filesystem, I mean the following:

Allocation is managed by an in-core table with one entry per filesystem block. Each entry contains either 0 or the number of another block. In toto, the table constitutes a set of linked lists of blocks.
The free list is a linked list (held in the in-core table) reached from the superblock. (An alternative would be to use the awful Microsoft FAT design, which marks free blocks with a special code and requires scanning the FAT to find free blocks.)
The on-disk copy of the block table is read at mount (filesystem initialization) time and is updated at your discretion (but note that your process might be killed at any time).
All file metadata is kept in the directory entry. At a minimum, this should include the file type (directory or file), the size in bytes, the name, and the number of the first block. (Subsequent blocks are located via the block table.) Other metadata, such as ownership, permissions, and timestamps, are up to you but are not required.
The block size is up to you, but it must be at least 512. (I recommend 4096, just to keep up with the modern world.)
Like any other file system, the on-disk data structures are stored in a single file (pseudo-disk) and are kept in binary. That means that if you choose to ignore my advice and write in a scripting language, you MAY NOT store things on-disk in any form that is essentially text-based, such as JSON. (Of course, storing filenames as text is permissible and encouraged.) Also, your on-disk format must be designed by you specifically for your file system, and you must be able to describe it in sufficient detail that I could write a C program to decode it. (For example, Python's pickle formats are not acceptable.)

For reference, my minimal implementation used a block size of 512 bytes (it was a while ago), had six fields in the superblock (including a magic number), and had four fields in the directory entry. To make it easy to store the superblock in a filesystem block, I used the following union:

union {
    struct fat_superblock s;
    char		pad[512];
}
			superblock;

(Note that the superblock should be only 512 bytes, even if you use a different block size for your filesystem. That design makes it possible to read the superblock without knowing the block size, which is a useful feature. If the filesystem uses blocks larger than 512 bytes, the remaining space in the larger "block" is simply wasted.)

I also found it useful to create a few macros to do things like seeking to a particular block, converting back and forth between byte offsets and block numbers, etc.

Important Notes

Note: You are supposed to be writing a real filesystem. The only differences from a true implementation of FAT should be:

It is backed by a plain file in the filesystem, rather than an actual disk, and
Your data structures are not required to be compatible with other FAT implementations (i.e., you don't have to be able to create or mount MS-DOS FAT disks).

In particular, this means not taking easy shortcuts. Like any filesystem, your implementation must satisfy the following criteria:

All access to the "disk" must be in multiples of the block size, which must be a power of 2 and must be 512 or greater.
Changes to files and directories must be reflected on disk immediately. No fair saving things in memory and then writing them out when you unmount.
Information must persist in the backing store after unmount.

You may also find it wise to review the requirements of Part 2 of this assignment to make sure you don't make a design decision that will back you into a corner.

Submission

Submit your code (it should be a single file) as assignment 2 with cs137submit. If you implement any additional features, describe them prominently in comments at the top of the file.

This page is maintained by Geoff Kuenning.