Abstract:
A constant problem in all fields of science is the difficulty of
collecting and sharing data. In file systems, that data usually takes
the form of "traces" that record system activity. For example, a
trace might record that a certain file was opened, 1024 bytes were
read, and the file was closed, while simultaneously another file was
being deleted. Traces are difficult to collect and often so large
(hundreds of gigabytes) that they are clumsy to transfer over the
Internet.
Previous HMC students successfully built a repository and populated it
with a small amount of data. We will discuss the challenges we
encountered and overcame while enhancing, extending, standardizing,
and filling the repository.