Better Garbage Collection

You may not have thought much about what the garbage collector does for you. You probably assume it works. You probably also assume that however it operates, it probably has to work that way. So do lots of other people, but it isn't nearly so simple

Garbage collection was once the domain of “unconventional” programming languages such as Lisp and SmallTalk, while programmers working in mainstream languages managed their own memory. Today, garbage collection is mainstream. Both Java and the languages in Microsoft's .NET family use garbage collection, and in addition garbage collection has been retrofitted to C and C++. With this move into the mainstream comes new challenges, and new questions. Does the garbage collector always do the right thing? Can you have memory leaks in garbage collected programs, and if so, what should we be doing about it?

There is also the question of what else the garbage collector might be able to do for you. Even in functional languages, it is possible for the garbage collector to do more, and thereby collect more.

Students will explore two key questions — when do current garbage collection techniques fail and why, and what additional opportunities does the presence of a garbage collector provide. Current garbage collectors employ a number of assumptions about the the way data is used that may not be true for programs written in an imperative style, particularly those programs that make heavy use of arrays and thus do not use pointers to provide indirection.

As garbage collection becomes increasingly prevalent, it becomes more and more important to understand how the application of garbage collection affects traditional imperative algorithms and data structures. Papers in garbage collection appear in various venues, including journals and conferences in the programming languages area as well as specific conferences of memory management (such as ISMM).

Students will produce original research that enhances the state of the art in the area of garbage collection. They will begin by studying the basics of garbage collection, and then examine recent research in the field with a view to developing a topic that can be meaningfully studied in the time available..

Depending on the topic chosen, different paths are possible, but two possible topics would be

  1. Developing metrics that can describe the memory behavior of a program. In particular, ways that we can characterize the memory behavior of programs. As things currently stand in the field, there are a number of standard benchmarks that people use to test the performance of memory-management subsystems, yet amazingly, none of these has been shown to be representative of real programs. Without any to describe “what normal is”, we can't know how similar the memory behavior of one program is compared to another, or whether one language is like another or not.
  2. Developing ways to visualize the high-level memory behavior of programs. There are a lot of heap profilers out there that will give you low-level data, but very little that gives you any sense, at a structural level, of what is going on in the heap.
  3. Developing introspective memory-profiling techniques. Although a number of good profilers exist for a variety of languages, they often languish unused. Sometimes this is because you have to use a special version of the runtime system, or recompile the code. Sometimes it is because the profiling system doesn't provide the information we need. In introspective profiling, we collect information without changing the runtime system, using profiling code written in the same language as the target being profiled. (This was the focus of last year's project, and a lot of good progress was made, so we have a very good starting point.)
  4. Implementing and profiling simplifiers for a variety of applications, particularly in the field of functional programming.
  5. Implementing hierarchical garbage collection using blobs (a concept proposed by a previous summer's REU team).

Previous Work

In the REU program for summer 2010, I worked with Julia Astrauckas, Leah Hanson, Aaron Mininger on the first three of these issues, developing aztrace, which I encourage you to read about. Continuing their work would be an awesome way to spend your summer. Also, check out Aaron's testimonal on the REU blog page.

Mentor: Professor Melissa O'Neill

Melissa Melissa has been a professor at Harvey Mudd since 2001. She is particularly interested in topics in the systems area of computer science, specifically parallel and distributed systems and programming languages. Her research is centered on techniques to make programming easier and more reliable. She received her B.Sc. in from the University of East Anglia in England, and her M.Sc. and Ph.D. from Simon Fraser University in Canada. Outside of Computer Science, she has occasionally been found performing hypnosis, defending the scientific view of the world, training cats to do tricks, subvertizing (twisting advertising and media images to her own ends) and watching Japanese anime (particularly Inuyasha).

Required Background

Student researchers on this project will be expected to have a solid background in the systems area, ideally having taken courses on operating systems and programming language design/implementation. Good programming skills are essential, as are good skills in written expression. Experience with functional programming languages such as Standard ML or Haskell is also valuable but not required. Knowledge of statistical techniques may be helpful.