Professor Christopher A. Stone
Computer Science Department
Harvey Mudd College
301 Platt Boulevard, Claremont, CA 91711
Email:stone@cs.hmc.edu
Phone:+1 909 607-8975
Office:McGregor 320

Teaching

What are you teaching this semester?

In Spring 2024, I'm teaching two sections of CS 70: Data Structures and Program Development.

I'm also in charge of the independent-study course CS 189: Programming Practicum and advising a team of students in the CS Clinic Program.

When are your office hours?

Mondays 4:15-5:15pm, Tuesdays 9:30-11am, Thursdays 3-4pm, and Fridays 3-4pm. For CS 70 students, Piazza posts can also be a good alternative, and there are grutors in the Pair Programming Lab quite often.

Research

Can computers read proofs?

Definitely!

There has been remarkable progress on proof assistants like Coq, Isabelle, Agda, and Lean. Humans direct the high-level strategy (anywhere from breaking a goal into subgoals and applying simple logical steps, to invoking sophisticated proof-search tactics). The proof assistant fills in the details. The result is a proof with no ambiguities or omissions with every step justified; the computer can verify that the steps fit together correctly and that the proof is flawless.

In some parts of Computer Science, it's now typical for papers claiming a theoretical result to be accompanied by a computer-checked proof. Proof assistants have been making inroads into pure mathematics as well; a well-known example is the Flyspeck Project that used proof assistants to verify once and for all Thomas Hales's intricate proof of the Kepler Conjecture (i.e., that there is no denser packing of spheres than the familiar "pile of cannonballs" arrangement).

Can humans read proofs written for computers?

Often not. Proofs written for humans look like this:

but popular proof assistants expect something much less readable, like:

One might worry that these proofs would be impossible to write, but proof assistant IDEs give huge amounts of guidance, and let the user focus on one small piece of the proof at a time. It's only when all the pieces are put together that it becomes impossible to see the forest for the trees.

Does that matter?

It's fine to construct “write only” proofs when we only care that the final conclusion is true. If we just want to confirm our algorithm sorts all inputs correctly, or that our cryptographical protocol is secure, or that our programming language is type-safe, then any correct proof will do.

But in traditional mathematics, proofs are intended to be read and understood by humans; they give the reader broader insights into the topic by explaining why the result is true.

The disconnect is unfortunate. For example, the authors of the book Homotopy Type Theory verified all their claims using Coq before releasing the book in 2013. Yet to this day, the authors continue receiving reports of errors in the theorems and proofs in their book! The problem is that the theorems and proofs in the book were (re)written in the “traditional” fashion so readers could understand them. These English-language proofs were not formally verified, and typos and more serious errors slipped through.

Could computers read proofs written for humans?

A system that understands English-language proofs would be a great tool for students learning how to write proofs, and for researchers who want to check their proofs before publication. Computers already do spell-checking and grammar-checking, so why not logic-checking too?

There are proof assistants like Mizar, Isar, and Naproche whose input is more human readable, e.g.,

The results range from stilted to impressive, but all of these systems require the user to write using a predetermined set of sentence forms, which may not be how they would prefer to write.

How do people use language in mathematical proofs?

This is something I have been investigating recently, in collaboration with multiple undergraduate researchers. Being able to answer this question would tell us how close systems like Isar and Naproche are to handling input from arbitrary users.

We first downloaded over 1.6M papers from the popular arXiv.org preprint site, and found that 352K of these contained at least one proof. We then wrote a script to extract proofs from the LaTeX source code of these papers, resulting in 3.7M proofs (containing 558M words total). Because we were interested in the use of language, and not the technical results, we replaced all mathematical formulas by the placeholder MATH, resulting in sentences like Let MATH be the restriction of MATH to MATH.

We have spent most of our time buildling this language corpus, and have only scratched the surface of analyzing the language. A few preliminary observations:

  • The sentence Without loss of generality we may assume that MATH. is twice as common as Without loss of generality we assume that MATH. and five times more common than We may assume without loss of generality that MATH.
  • Mechanically extracted two-word collocations (words that appear together more often than can be explained by chance) include ad absurdum, almost surely, finitely many, pigeon hole, roughly speaking, and spherical harmonics. Longer collocations include induces an isomorphism between and it suffices to check that.
  • Off-the-shelf NLP tools can do very badly on mathematical text. For example, when trying to identify the parts of speech for each word in the sentence Suppose MATH is even. the widely-used NLTK library identifies Suppose MATH as two proper nouns instead of a verb and a noun, and identifies even as an adverb instead of an adjective. But by retraining NLTK's part-of-speech tagger with a small collection of imperative sentences extracted from our data, we can get much better results.

git

What's next?

I've recently been thinking about using this large corpus of proofs to guide the creation of a formal grammar for the most common sentence forms found in mathematical proofs. This would still be a very limited subset of English, but a good design would produce an language that feels “big” and “natural”.

Grammar engineering is a hard problem, so I've been thinking about ways that tools could help. In particular, I've been investigating new algorithms for efficiently generating random sentences (or all sentences) from a proposed grammar, to help debug that grammar. There are a number of such algorithms in the literature for Context-Free Grammars, but not for more sophisticated non-context-free formalisms like Combinatory Categorial Grammars that may be more suitable for English-language proofs.

Why not just use a Large Language Model?

In principle, a LLM like ChatGPT could be used to validate proofs. But for proof checking, there's a huge difference between a system that works most of the time, and one that is reliable all of the time (and perhaps even provably reliable)!

LLMs are designed to give answers that sound plausible, and “Yes, that sounds right to me.” is a depressingly plausible answer to a long, complicated, and incorrect proof.

More generally, even if you asked ChatGPT to translate the English-language proof into a formal system like Coq or Lean and then had a computer verify the result, there's always the question of hallucination: does the translation always preserve the meaning of the input proof, or might the response to an incorrect input proof be a more plausible (i.e., corrected) Coq or Lean proof, or a correct proof for a different theorem?

In contrast, a theory-driven system for checking natural language proofs is likely to be less flexible, but also much more predictable and with explanable output.

Why not write a translator from computer proofs to English?

There has been a lot of interesting work over the years in translating “formal” (mathematically precise) proofs into readable English. The main disadvantage is that proofs contructed using a proof assistant tend to be very different than proofs written for humans.

For example, suppose we are doing a proof about all the different ways that 10 things can interact. A human using a proof assistant might tell the system to consider all 100 cases, apply automated tactics to each case, and report back which cases were too tricky for and require more guidance from the user. Translating this to English yields a proof listing 100 separate cases, most of which will be fairly boring. In contrast, a human approaching the same problem would likely put in thought ahead of time to use reasoning principles like symmetry and invariants to explain why only a small number of cases can matter, and write a proof that just considers them.

Also, a lot of proofs are written in English to start with, and translating computer proofs to English doesn't help with those.

What else have you worked on?

Topics include:

Personal

Why are you trying to learn Slovenian?

My wife Sneža is from Slovenia, and we visit her relatives there every year or two. I also have a longtime collaborator, Prof. Andrej Bauer, who teaches in the Mathematics Department of the University of Ljubjana, Slovenia.