A system developed by Google’s DeepMind has set a new record for AI performance on geometry problems. DeepMind’s AlphaGeometry managed to solve 25 of the 30 geometry problems drawn from the International Mathematical Olympiad between 2000 and 2022.
That puts the software ahead of the vast majority of young mathematicians and just shy of IMO gold medalists. DeepMind estimates that the average gold medalist would have solved 26 out of 30 problems. Many view the IMO as the world’s most prestigious math competition for high school students.
“Because language models excel at identifying general patterns and relationships in data, they can quickly predict potentially useful constructs, but often lack the ability to reason rigorously or explain their decisions,” DeepMind writes. To overcome this difficulty, DeepMind paired a language model with a more traditional symbolic deduction engine that performs algebraic and geometric reasoning.
The research was led by Trieu Trinh, a computer scientist who recently earned his PhD from New York University. He was a resident at DeepMind between 2021 and 2023.
Evan Chen, a former Olympiad gold medalist who evaluated some of AlphaGeometry’s output, praised it as “impressive because it's both verifiable and clean.” Whereas some earlier software generated complex geometry proofs that were hard for human reviewers to understand, the output of AlphaGeometry is similar to what a human mathematician would write.
AlphaGeometry is part of DeepMind’s larger project to improve the reasoning capabilities of large language models by combining them with traditional search algorithms. DeepMind has published several papers in this area over the last year.
How AlphaGeometry works
Let’s start with a simple example shown in the AlphaGeometry paper, which was published by Nature on Wednesday:
The goal is to prove that if a triangle has two equal sides (AB and AC), then the angles opposite those sides will also be equal. We can do this by creating a new point D at the midpoint of the third side of the triangle (BC). It’s easy to show that all three sides of triangle ABD are the same length as the corresponding sides of triangle ACD. And two triangles with equal sides always have equal angles.
Geometry problems from the IMO are much more complex than this toy problem, but fundamentally, they have the same structure. They all start with a geometric figure and some facts about the figure like “side AB is the same length as side AC.” The goal is to generate a sequence of valid inferences that conclude with a given statement like “angle ABC is equal to angle BCA.”
For many years, we’ve had software that can generate lists of valid conclusions that can be drawn from a set of starting assumptions. Simple geometry problems can be solved by “brute force”: mechanically listing every possible fact that can be inferred from the given assumption, then listing every possible inference from those facts, and so on until you reach the desired conclusion.
But this kind of brute-force search isn’t feasible for an IMO-level geometry problem because the search space is too large. Not only do harder problems require longer proofs, but sophisticated proofs often require the introduction of new elements to the initial figure—as with point D in the above proof. Once you allow for these kinds of “auxiliary points,” the space of possible proofs explodes and brute-force methods become impractical.