Papers, etc.

Contents

Here are some things you might be interested in reading…

Random Number Generation

I've had a longstanding interest in computational creativity and randomized algorithms, and as a result I've had a longstanding interest in random number generation, and in some other related areas like hashing. But I nevertheless never expected to write a paper about random-number generation. It happened quite by chance…

PCG Family of Random Number Generators

“PCG: A Family of Simple Fast Space-Efficient Statistically Good Algorithms for Random Number Generation”, Melissa E. O'Neill, submitted to ACM Transactions on Mathematical Software (47 pages)

What's in the Paper

Although it includes technical details, the paper is written to be accessible to a broad audience.

The paper

Articulates in detail the desirable properties of a random number generator including performance, correctness, uniformity, and unpredictability, as well as sound mathematical grounding. It also describes some less well-known desirable features, such as k-dimensional equidistribution and seekability (a.k.a. jump-ahead).
Describes a new permutation technique, founded on the idea of permutation functions on tuples, that can dramatically improve the output quality of a medium-quality random number generator while preserving important qualities such as uniformity.
Develops a new random number generation technique, using a base RNG with weak statistical properties but other desirable ones (specifically, a linear congruential generator) combined with an output function based on permutation functions on tuples.

The name for the family, PCG, stands for permuted congruential generator, combining both concepts that underly the generation scheme, namely permutation functions on tuples and a base linear congruential generator.

In addition to describing the PCG family, the paper also assesses its performance. To further that goal, the paper

Develops a model for the performance of an ideal uniform random number generator with b bits of state, including the notion of the point at which such a generator becomes overtaxed and the constraints of uniformity make it unable to deliver truly random behavior.
Determines an approximation of the point at which any uniform generator, even an ideal one, could be expected to fail TestU01's statistical tests.
Presents a new way to contrast the statistical performance of different random number generation schemes capturing the concept of headroom to pass more stringent tests in the future.
Uses the above statistical-performance comparison scheme, as well as time and space performance, to contrast PCG generators with existing ones.

Check out pcg-random.org for details on the PCG family of random number generators which provides a combination of features rarely seen on existing generators.

CS Education + Functional Programming

Sieve of Eratosthenes

“The Genuine Sieve of Eratosthenes” (PDF), Journal of Functional Programming, 19(1), January 2009, pp 95–106.

What's in the Paper

For more than 30 years, people used the following example to show the elegance of functional programming:

primes = sieve [2..]
sieve (p : xs) = p : sieve [x | x <− xs, x ‘mod‘ p > 0]

People claimed that this code showed the Sieve of Eratosthenes; unfortunately however, it isn't that algorithm.

This paper shows

Why this widely-seen implementation is not the Sieve of Eratosthenes (and is in fact an inefficient implementation of trial division);
How an algorithm that is the Sieve of Eratosthenes may be written in a lazy functional style; and
How our choice of data structure matters.

It also analyses time complexity of prime-finding methods described.

CS Education + Psychology

Classroom Climate

Framing Classroom Climate for Student Learning and Retention in Computer Science, Leicia Barker, Melissa O'Neill and Nida Kazim (PDF, SIGCSE 2014)

But several people who saw the SIGCSE talk, wanted to have the presentation slides.

What's in the Paper?

This paper is about some of the techniques I apply when teaching. The basic idea is that teachers also need to be applied psychologists, because our classrooms are a minefield of subtle influences. When we design classes, we can take account of framing effects and use them to create a better classroom experience.

Generic Programming + Functional Programming

Genetic Programming with SK-Combinators

“Functional Genetic Programming and Exhaustive Program Search with Combinator Expressions” (PDF), Forrest Briggs & Melissa O'Neill, KES Journal, 2008

What's in the Paper?

The field of generic programming often struggles with the problem of avoiding invalid programs. This paper shows that combinator expressions are a useful program representation for genetic programming, because they provide the expressive power of high-level strongly-typed functional programming languages while avoiding the problems of variables by eliminating them. Specifically, it shows that

Combinator expressions can represent programs that introduce local variables, but the genetic operators to manipulate combinator expressions do not require special cases for variables;
Generating combinator expressions is efficient compared to prior work on several problems (including even parity on N inputs, and devising representations and implementations for stacks and queues);
Sometimes exhaustive enumeration of all possible programs is competitive with genetic programming; and,
Some purportedly difficult GP problems become trivial to solve with exhaustive enumeration, given the constraints of types and well-formed programs.

Parallelism, Concurrency, etc.

Parallelism and concurrency are hard. If we can make it easier, that'd be great.

Observationally Cooperative Multithreading

“Observationally Cooperative Multithreading”, Christopher A. Stone, Melissa E. O'Neill, and the student OCM team OOPSLA '11, (PDF, 2 pages)

But you should really read this much longer version of the paper (PDF, 16 pages).

What is OCM?

Observationally Cooperative Multithreading (OCM) is a new approach to shared-memory parallelism. It addresses a key problem of mainstream concurrency control mechanisms—they can be prohibitively hard to reason about and debug. Programmers using OCM simply write code as if they were using the cooperative multithreading model (CM) for uniprocessors. The underlying OCM implementation then optimizes execution—running threads in parallel when possible - in such a way that the results are consistent with CM. In addition to providing easier reasoning and debugging, OCM is also highly adaptable in terms of its underlying concurrency-control mechanism. Programmers using OCM have the capability to take a finished program and choose the strategy (e.g., locks or transactions) that provides optimal performance.