Aleph Prover Uses Formal Methods to Disproof Erdős Theorem

Logical

Intelligence

Logical

Intelligence

Logical

Intelligence

Aleph Prover

Formalized Planar

Unit Problem Disprove

Aleph Prover

Formalized Planar

Unit Problem Disprove

May 28, 2026

Alex Fetisov

Last week, OpenAI demonstrated that AI systems can help discover new mathematics approaches in its recent counterexample to a conjecture of Paul Erdős announcement. We at Logical Intelligence took that as a challenge as well and formally verified the result in Lean 4 using the latest version of our Aleph Prover agentic system. Aleph Prover has formalized the disproof of the planar unit problem, following the recent OpenAI result that used AI to find a counterexample to a conjecture of Paul Erdős.

The result is now a machine-checkable artifact in Lean 4.

That puts the new direction on research in different areas like mathematics and computer science, where AI can significantly accelerate progress around research. It’s equally important not only to prove things, but to find counter examples as well (since researchers spend a lot of time on it as well), so research can move much faster.

This is important because complex mathematical results depend on large proof structures with many intermediate lemmas. Formalization gives us a way to validate those results programmatically, reduce the amount of proof surface that experts must inspect manually, and reuse verified components in future work.

For AI research, this converts a hard mathematical discovery into verified proof data that models can learn from and future systems can build on.

The direction is clear. AI systems can generate candidate proofs and programs while formal methods can verify them or create counter examples at scale. Aleph Prover generates the formal proof structure and Lean 4 is the trusted verifier. That ensures that correctness is fully deterministic and does not depend on trusting the AI system. This is the foundation for verifiable AI and the next generation of formally verified code generation.

The result is now a machine-checkable artifact in Lean 4.

For AI research, this converts a hard mathematical discovery into verified proof data that models can learn from and future systems can build on.

Motivation

We recently posted about Aleph Prover results on top formal-reasoning benchmarks. Aleph Prover is our in-house prover agent.

We built Aleph Prover to solve and formalize increasingly complex problems at scale - faster, cheaper, and with stronger verification guarantees.

Working on benchmarks helped us identify the limits of the previous version of Aleph Prover. Over the last few months, we have been developing a new version designed to handle more complex proof search and formalization tasks.

The recent OpenAI result on the planar unit distance conjecture presented a compelling challenge for formal verification. The proof involved a complex chain of mathematical dependencies and was exactly the kind of long horizon formalization problem we designed the next generation of Aleph Prover to tackle. The previous version of Aleph Prover could not formalize it, making it an ideal benchmark for our latest iteration. The new system succeeded, converting the result into a machine-checkable proof, providing additional confidence that we are moving in the right direction towards verifiable AI.

Motivation

We recently posted about Aleph Prover results on top formal-reasoning benchmarks. Aleph Prover is our in-house prover agent.

We built Aleph Prover to solve and formalize increasingly complex problems at scale - faster, cheaper, and with stronger verification guarantees.

Results

The full formalization is available in our public GitHub repository.

Some statistics from the proof:

Results

The full formalization is available in our public GitHub repository.

Some statistics from the proof:

The process required integrating 252 Mathlib modules. Our formalization draws on a wide mathematical toolkit: analytic prime estimates, class field towers, pro-p Galois groups, Golod-Shafarevich methods, continuous group cohomology, supplemented by measure-theoretic lattice counting and Euclidean geometry.

What does a 33,087-line proof actually look like?

The formalization currently is conditional on 2 external textbook theorems from the textbook (look at the appendix section for more details). These results are not yet available in Mathlib, so we treated them as trusted assumptions for this version of the proof. We initially started with around 30 such dependencies and reduced the surface to 2.

Human-in-the-loop and Validation

We had one iteration of Aleph Prover, then a human review of a small part of the proof, followed by a second iteration of Aleph Prover, which produced the final result. Thank you to everyone who participated in the review!

As you know, LLMs sometimes hallucinate and produce something nonsensical. Sometimes they even intentionally try to fool checkers and produce something that only looks correct.

In the case of Lean 4, most issues can be checked using the compiler, checkers, and a comparator. This allows us to run many iterations of code improvement without human involvement.

The aspect of the proof that Lean 4 cannot verify are the external references to classical theorems which have not yet been formalized. Seeing as only a small part of mathematics has been formalized in Lean so far, it is a tall order to formalize every classical theorem used in OpenAI’s proof.

The final output of Aleph Prover relied on two such external results in its proof. Therefore, we needed to check three things:

That the problem statement itself was correctly translated from English into Lean 4.
That the two external theorems were formulated in Lean 4 exactly the same as in standard textbook references.
That the prover did not somehow manage to trick us or the Lean 4 kernel.

We checked the first two points with the help of professional mathematicians. Thank you to our external reviewers Sergey Galkin, Mark Obozov, Browning, Thomas L, Kevin Buzzard, Johan Commelin, as well as to our internal team of mathematicians: Michael Freedman, Michael Mulligan, and Milo Moses. Special thanks to Kevin Buzzard, who, after the first iteration of Aleph Prover’s work, found errors in the statements of the external theorems and provided the feedback needed for Aleph Prover to correct them in the second iteration.

We checked the third point by building the project and running the Comparator tool, which re-checks the proof with two independent Lean kernels and certifies that it matches the stated theorem with no hidden axioms.

We are releasing the formalization as open source so that other researchers can inspect, extend, and independently validate the result.

Some of the external theorems were already formalized in the repository https://github.com/AlexKontorovich/PrimeNumberTheoremAnd. Aleph Prover found this repository on its own and suggested reusing those formalizations. We thank Alex Kontorovich and the broader community of mathematicians contributing to this project.

Human-in-the-loop

and Validation

As you know, LLMs sometimes hallucinate and produce something nonsensical. Sometimes they even intentionally try to fool checkers and produce something that only looks correct.

In the case of Lean 4, most issues can be checked using the compiler, checkers, and a comparator. This allows us to run many iterations of code improvement without human involvement.

The final output of Aleph Prover relied on two such external results in its proof. Therefore, we needed to check three things:

That the problem statement itself was correctly translated from English into Lean 4.
That the two external theorems were formulated in Lean 4 exactly the same as in standard textbook references.
That the prover did not somehow manage to trick us or the Lean 4 kernel.

We are releasing the formalization as open source so that other researchers can inspect, extend, and independently validate the result.

What’s next?

This problem raises the bar for AI-assisted formalization, and the new version of Aleph Prover was able to meet it.

This formalization should not be misinterpreted as evidence of fully autonomous mathematical reasoning. Rather, it shows that AI assisted theorem formalization is possible to scale in ways that were previously impossible to operationalize.

OpenAI demonstrated that AI systems can in fact help discover new mathematics. Formal verification systems like Aleph Prover help convert those discoveries into reusable, machine-checkable infrastructure, establishing a new verification layer in AI.

What’s next?

This problem raises the bar for AI-assisted formalization, and the new version of Aleph Prover was able to meet it.

Appendix

Aleph Prover did not formalize two of the classical theorems used in OpenAI’s proof. The results are stated below.

The Golod-Shafarevich inequality (1964): For every prime `p` and every nontrivial finite `p`-group `Q`, there is an inequality

```

d(Q)^2 < 4 r(Q).

```

Here `d(Q)` is the generator rank, defined as the minimal size of a generating set of `Q`, and `r(Q)` is the relation rank, defined as the dimension of the 2nd cohomology group `H^2(Q; F_p)` as a vector space over the finite field `F_p`.

Reference: NSW, Theorem 3.9.7; Serre, Chapter I, Appendix 2, Theorem 1.

Shafarevich’s rank-relation bound (1963): Let `F` be a number field, and let `p` be an odd rational prime. Let `F^{un,p}/F` denote the maximal unramified pro-`p` extension of `F`, and let

```

G = Gal(F^{un,p}/F).

```

The relation rank of G is finite, and it satisfies

```

r(G) <= d(G) + r_1 + r_2 - 1 + δ_p(F).

```

Here `r(G)` is the relation rank of `G`, defined as the `F_p`-dimension of `H^2(G; F_p)`; `d(G)` is the generator rank of `G`, defined as the minimal size of a topological generating set of `G`; `r_1` is the number of real places of `F`; `r_2` is the number of complex places of `F`; `δ_p(F)` is `1` if `F`contains a primitive `p`-th root of unity, and `0` otherwise.

Reference: Mayer, Theorem 5.1 with `S = ∅`; alternatively, combine Koch, Theorems 11.5 and 11.8.

Bibliography:

NSW: J. Neukirch, A. Schmidt, and K. Wingberg, Cohomology of Number Fields, 2nd ed., Grundlehren der mathematischen Wissenschaften 323, Springer-Verlag, Berlin, 2008. DOI: 10.1007/978-3-540-37889-1.
Koch: H. Koch, Galois Theory of p-Extensions, Springer Monographs in Mathematics, Springer-Verlag, Berlin, 2002. DOI:
10.1007/978-3-662-04967-9.
Serre: J.-P. Serre, Galois Cohomology, translated from the French by Patrick Ion, Springer Monographs in Mathematics, Springer-Verlag, Berlin, corrected 2nd printing, 2002. DOI: 10.1007/978-3-642-59141-9.
Mayer: D. C. Mayer, "New number fields with known p-class tower," Tatra Mountains Mathematical Publications 64 (2015), 21-57. DOI: 10.1515/tmmp-2015-0040; arXiv: 1510.00565.

Appendix

Aleph Prover did not formalize two of the classical theorems used in OpenAI’s proof. The results are stated below.

The Golod-Shafarevich inequality (1964): For every prime `p` and every nontrivial finite `p`-group `Q`, there is an inequality

```

d(Q)^2 < 4 r(Q).

```

Reference: NSW, Theorem 3.9.7; Serre, Chapter I, Appendix 2, Theorem 1.

Shafarevich’s rank-relation bound (1963): Let `F` be a number field, and let `p` be an odd rational prime. Let `F^{un,p}/F` denote the maximal unramified pro-`p` extension of `F`, and let

```

G = Gal(F^{un,p}/F).

```

The relation rank of G is finite, and it satisfies

```

r(G) <= d(G) + r_1 + r_2 - 1 + δ_p(F).

```

Reference: Mayer, Theorem 5.1 with `S = ∅`; alternatively, combine Koch, Theorems 11.5 and 11.8.

Bibliography:

NSW: J. Neukirch, A. Schmidt, and K. Wingberg, Cohomology of Number Fields, 2nd ed., Grundlehren der mathematischen Wissenschaften 323, Springer-Verlag, Berlin, 2008. DOI: 10.1007/978-3-540-37889-1.
Koch: H. Koch, Galois Theory of p-Extensions, Springer Monographs in Mathematics, Springer-Verlag, Berlin, 2002. DOI: 10.1007/978-3-662-04967-9.
Serre: J.-P. Serre, Galois Cohomology, translated from the French by Patrick Ion, Springer Monographs in Mathematics, Springer-Verlag, Berlin, corrected 2nd printing, 2002. DOI: 10.1007/978-3-642-59141-9.
Mayer: D. C. Mayer, "New number fields with known p-class tower," Tatra Mountains Mathematical Publications 64 (2015), 21-57. DOI: 10.1515/tmmp-2015-0040; arXiv: 1510.00565.

Appendix

Aleph Prover did not formalize two of the classical theorems used in OpenAI’s proof. The results are stated below.

The Golod-Shafarevich inequality (1964): For every prime `p` and every nontrivial finite `p`-group `Q`, there is an inequality

```

d(Q)^2 < 4 r(Q).

```

Reference: NSW, Theorem 3.9.7; Serre, Chapter I, Appendix 2, Theorem 1.

Shafarevich’s rank-relation bound (1963): Let `F` be a number field, and let `p` be an odd rational prime. Let `F^{un,p}/F` denote the maximal unramified pro-`p` extension of `F`, and let

```

G = Gal(F^{un,p}/F).

```

The relation rank of G is finite, and it satisfies

```

r(G) <= d(G) + r_1 + r_2 - 1 + δ_p(F).

```

Reference: Mayer, Theorem 5.1 with `S = ∅`; alternatively, combine Koch, Theorems 11.5 and 11.8.

Bibliography:

NSW: J. Neukirch, A. Schmidt, and K. Wingberg, Cohomology of Number Fields, 2nd ed., Grundlehren der mathematischen Wissenschaften 323, Springer-Verlag, Berlin, 2008. DOI: 10.1007/978-3-540-37889-1.
Koch: H. Koch, Galois Theory of p-Extensions, Springer Monographs in Mathematics, Springer-Verlag, Berlin, 2002. DOI:
10.1007/978-3-662-04967-9.
Serre: J.-P. Serre, Galois Cohomology, translated from the French by Patrick Ion, Springer Monographs in Mathematics, Springer-Verlag, Berlin, corrected 2nd printing, 2002. DOI: 10.1007/978-3-642-59141-9.
Mayer: D. C. Mayer, "New number fields with known p-class tower," Tatra Mountains Mathematical Publications 64 (2015), 21-57. DOI: 10.1515/tmmp-2015-0040; arXiv: 1510.00565.

logical

intelligence

Home

Home

Kona 1.0

Kona 1.0

Aleph

Aleph

Press

Press

Follow

Follow

Careers

Careers

Contact

Contact