Photo by Roman Mager on Unsplash
OpenAI's AI disproved an 80-year Erdős conjecture
/ 19 min read
Table of Contents
The eighty-year wall that finally fell
A wall stood for eighty years, and a language model knocked a hole in it. On May 20, 2026, OpenAI published a result that no AI company had previously been able to claim with a straight face: an internal general-purpose reasoning model had, without scaffolding, fine-tuning, or domain-specific prompting, disproved a conjecture in discrete geometry first whispered by Paul Erdős in 1946 and stared at by combinatorists ever since. Per OpenAI’s announcement of the discrete-geometry result, the model produced a construction that beats the square-grid lower bound on the planar unit distance problem — a problem that asks, with deceptive simplicity, how many pairs of points placed on a flat plane can sit at exactly one unit apart. For decades the working assumption was that the square lattice was essentially optimal. The new construction proves it is not. The bound that almost everyone believed was tight is, in fact, beatable, and the system that found the beating construction is the same general-purpose architecture that writes code and answers customer support tickets.
The stakes attached to this single result are larger than they look. Per TechCrunch’s coverage of OpenAI’s renewed math claim, this is the first time a leading mathematical community has independently verified an autonomous AI contribution to a frontier open problem. Seven months ago, OpenAI’s former VP Kevin Weil announced that GPT-5 had solved ten previously unsolved Erdős problems and made progress on eleven more, and the entire claim collapsed within forty-eight hours when Thomas Bloom — who maintains the canonical Erdős problems site — pointed out that every “solution” was already in the literature. Per TechCrunch’s original report on that earlier failure, Demis Hassabis called the episode embarrassing and Yann LeCun mocked OpenAI for being “hoist by their own GPTards.” Weil deleted the post and exited the company in April. The new result lands inside that crater, and it lands with a 19-page companion paper signed by nine working mathematicians, including a Fields Medalist. The claim survives because the verification chain survives.
What that verification means in practice is worth being precise about. Per Scientific American’s reporting on the proof, the model’s argument did not iterate on grid arrangements. It replaced the Gaussian integers, the classical mathematical object that powers the square-grid construction, with infinite class field towers — structures from algebraic number theory most working geometers have never deployed against a unit-distance question. The proof’s correctness depends on the existence of certain number fields with very specific ramification behavior, and the existence proof leans on Golod–Shafarevich theory, a piece of mathematical apparatus that even most graduate combinatorists would describe as exotic in this context. The model did not stumble onto an obscure paper and translate it. It assembled the connection. That assembly is what the nine mathematicians on the verification paper checked, line by line, before publishing alongside the OpenAI release.
Eighty years of mathematical effort against a problem stated in a single sentence about points and unit distances is a long line of failure to be sitting on the wrong side of, and the asymmetry is part of what makes the result load-bearing rather than novelty-press. Erdős himself returned to the problem repeatedly across his career; the best upper bounds in the surrounding literature — by Spencer, Szemerédi, Trotter, and others — represent decades of human work that did not crack the central conjecture. Per Gigazine’s reporting on the discrete-geometry disproof, what landed on May 20 is not a proof of the original conjecture but a disproof — a counterexample construction that exceeds the bound the conjecture would have implied. Disproofs in this branch of combinatorics are scarcer and harder to find than corollaries, because they require building an object, not just refining an inequality. The model built the object. The next eighty years of unit-distance research start from a different floor, and the wall has a hole that did not exist on May 19.
How a general-purpose model out-thought eight decades of geometers
The technical anatomy of the discovery matters because the meta-claim depends on it. Per OpenAI’s technical writeup of the proof methodology, the model was not given the unit distance problem as a target. It was given an open problem catalog and free reign to attempt items. The system worked the problem the way a strong mathematician might — sketching candidate constructions, abandoning ones that failed obvious local checks, escalating to deeper machinery when the elementary tools ran out. The transition from square-grid intuition to algebraic number theory is precisely the kind of jump that distinguishes routine combinatorial bookkeeping from research-grade discovery. It is also precisely the kind of jump that prior AI math systems — including specialized geometry provers, AlphaProof, and the various theorem-proving toolchains — have historically not made unaided. The model was operating as a generalist, and it found a non-obvious move.
The verification paper deserves equal attention because it sets the new floor for how AI math claims will be evaluated going forward. Per the arXiv preprint of the verification effort, nine mathematicians signed the companion document: Noga Alon, Thomas Bloom, Tim Gowers, Daniel Litt, Will Sawin, Arul Shankar, Jacob Tsimerman, Victor Wang, and Melanie Matchett Wood. The list is conspicuously load-bearing. Gowers holds a Fields Medal. Alon is one of the most cited combinatorists alive. Bloom is the maintainer of the Erdős problems site who debunked the October 2025 claim. Their job was to translate the model’s proof into human-readable form, verify it against the surrounding literature on Ellenberg–Venkatesh and Hajir–Maire–Ramakrishna class-field results, and confirm the construction does what the model says. They did. That signature, more than any benchmark number, is the artifact that distinguishes this announcement from the October retraction.
The third piece of the puzzle is Terence Tao’s reaction, which is doing serious heavy lifting in the press cycle. Per coverage by AIBase of the academic response, Tao — widely regarded as one of the greatest living mathematicians — read the proof and called it “a meaningful contribution to the anatomy of integers that goes well beyond the solution of this particular Erdős problem,” then proceeded to sketch an extension that turns the model’s construction into a seed for a more general theory. That extension is, in some sense, the strongest endorsement possible. A casual nod from Tao would be polite. An extension by Tao is a working mathematician’s signal that the underlying ideas are productive, not just correct. Per Gil Kalai’s writeup on his Combinatorics and More blog, the broader research community appears to be taking this seriously enough to start citing the result in adjacent papers within days, not months — the kind of velocity that suggests it will not unwind under deeper scrutiny.
The model’s process is the part that should worry and excite operators in roughly equal measure. The system was not given a curriculum on class field towers. It was given access to the same mathematical literature any LLM-tier system has, plus the standard general-purpose reasoning loop OpenAI uses for its frontier models. The discovery is therefore a data point on what general reasoning, scaled, can do without specialization. Per Interesting Engineering’s coverage of the problem and the proof, this is consistent with a pattern across 2025 and early 2026: math benchmark scores have been climbing faster than any other category of evaluation, and the frontier labs have been quietly promising that the reasoning systems would eventually produce original contributions. The promise has been redeemed, once. The question every research lab is now asking is whether that “once” extends to a “twice” — and on what cadence.
The financial and competitive subtext is hard to miss. Per CNBC’s coverage of OpenAI’s $122 billion funding round at an $852 billion valuation, the company is currently raising and deploying capital at a scale that requires evidence of frontier capability, not just enterprise revenue. A verified mathematical discovery — particularly one accompanied by Fields Medal signatures — is exactly the kind of evidence that supports the investor narrative that reasoning models are not plateauing. It also lands during a quarter in which OpenAI is filing a confidential S-1 with the SEC, per coverage of the firm’s IPO preparation in OpenTools’ filing report, and the timing of the math result will not have escaped the company’s communications team. Verified original contributions to open mathematics are durable artifacts. They survive the news cycle. They go into the prospectus. They become the answer to the question every public-market analyst asks: what is this $852 billion paying for?
The competitive context is the other half of the subtext. Per TechCrunch’s reporting on the Anthropic–xAI compute deal, Anthropic has just committed to spending roughly $1.25 billion per month for Colossus compute through 2029. Per CNBC’s coverage of Anthropic’s Q2 financial projections, Anthropic is on track to clear $10.9 billion in Q2 revenue with a projected $559 million operating profit. Both labs are spending and earning at scale, and both have been racing to demonstrate that reasoning systems can produce novel scientific output. OpenAI now has a single, verified, citable discovery. Anthropic has Claude Code and a fast-growing research division but no equivalent public benchmark moment. The asymmetry will pressure both companies — OpenAI to extend the result, Anthropic to produce one of its own. The result is therefore not just a math story. It is a competitive marker that will reshape how both labs allocate research compute over the next twelve months.
| Marker | OpenAI claim | Verification |
|---|---|---|
| Problem | Erdős unit distance, 1946 | 80-year-old open question |
| Method | Class field towers | Algebraic number theory |
| Verifying mathematicians | 9, incl. Fields Medalist | Co-authored 19-page paper |
| Comparable prior AI math claim | GPT-5, October 2025 | Retracted within 48 hours |
The table compresses the contrast that makes the new result load-bearing. October’s claim was generic, unverified, and collapsed on first contact with the canonical-problem maintainer. May’s claim is specific, peer-translated, and arrived with a co-authored paper from nine mathematicians whose reputations would not survive endorsing a flawed proof. The bar for “verified AI mathematical contribution” has been set, in public, by people who can credibly defend it. Future labs claiming similar results will be compared against this version, not the October version.
The asterisks every mathematician is whispering about
The first counterpoint to “AI did original mathematics” is that the result is, structurally, one data point. Per TechCrunch’s careful framing of the announcement, what the model produced is a counterexample construction that disproves a conjecture — important, but a different kind of mathematical work than proving a positive theorem. The model demonstrated the conjecture was wrong; it did not establish what the right answer is. The problem has a new floor and the conjecture has died, but the field has not been closed. Future progress will determine whether the model’s construction is itself near-optimal or whether it, too, gets beaten by something stranger. A second data point — preferably one that takes a different shape from this one — is what would convert this from anecdote into trend.
The second counterpoint is sampling bias in how AI math claims reach the public. Per QuantumZeitgeist’s analysis of the polynomial gain on unit-distance pairs, labs running large reasoning models can attempt thousands of open problems in parallel. Most attempts fail. A few produce candidate constructions that on inspection turn out to be wrong, or already known, or trivially extendable to a counterexample by humans who recognize the move. The reporting pipeline filters aggressively for the ones that survive verification. That filtering is appropriate — only the verified results should be announced — but it obscures the base rate. If a frontier reasoning model attempts 10,000 open problems and produces one verifiable original contribution, the headline is real but the per-attempt success rate is not the kind of number that scales operations. The October retraction is informative on this point: it showed how easy it is for the same architecture to confabulate the appearance of a contribution. The May result shows the architecture can also produce a real one. The ratio between the two failure modes matters, and we do not have it.
The third counterpoint is the question of leakage. Per the Cryptobriefing technical breakdown of the proof, the algebraic-number-theory machinery the model deployed is well-documented in the training corpus, and the question is whether the model rediscovered a connection that was implicit in the literature but never explicitly drawn. The nine-mathematician verification paper addresses this — the connection had not been drawn, and the specific use of Golod–Shafarevich to establish the existence of the required class field towers in this geometric context is novel. But the broader point lingers: as training corpora become more comprehensive, the line between “rediscovery from latent knowledge” and “original synthesis” gets harder to police. The verifiers’ job is increasingly to determine which side of that line a given output sits on, and reasonable mathematicians will disagree about edge cases. The May result appears to be on the right side, but the cottage industry of post-hoc adjudication is going to grow.
The fourth counterpoint is about cost and replicability. OpenAI has not disclosed how much compute the model burned to produce the unit-distance construction, how many failed attempts preceded the successful one, or whether the system was given any human-provided hints during the search. The proof itself is short; the construction process is opaque. Per the OpenAI release page on the discrete-geometry result, the company describes the model as “general-purpose” and the result as “autonomous,” but those terms are doing significant work, and outside observers cannot easily audit them. If the discovery cost the equivalent of a small institute’s annual budget in compute, the practical implications for scientific research are different than if it cost a few thousand dollars of inference. The economics of AI-driven discovery have not been disclosed, and the gap between “AI can produce mathematical discoveries” and “AI can produce mathematical discoveries at a price point that materially changes research productivity” is large. The May result establishes the former. It does not establish the latter.
The fifth counterpoint is the one mathematicians are talking about in private: the discovery may not map to other scientific domains. The unit distance problem is a clean, low-dimensional, combinatorial question — the kind of compact statement that can be fully specified to the system. Most open scientific problems are not like this. A drug-discovery question is not a clean conjecture; a materials-science question is not a closed-form combinatorial statement. The bridge from “solves clean math problems” to “accelerates messy empirical science” remains a bridge, not a completed structure. Per OpenAI’s introduction of GPT-Rosalind for life sciences, the company is already attempting that crossing on the pharmaceutical side, but those results have not yet generated comparable independent verification (see also our coverage in /posts/2026-04-17-openai-gpt-rosalind-drug-discovery-life-sciences/).
The sixth counterpoint is structural. The proof was generated by the model; the paper was co-authored by nine humans. Per Scientific American’s discussion of publication norms, the major math journals have not yet established stable conventions for AI co-authorship on novel proofs. The arXiv preprint serves as the public record for now. The peer-reviewed publication will require a longer process, and that process will produce policy norms constraining how future AI contributions are credited and adjudicated. How this gets resolved will shape the institutional response to the next dozen results like it.
What science looks like when AI joins the authorship list
The outlook is now bifurcated, and operators in research-adjacent industries need to plan for both branches. The optimistic branch is that this is the first of many. Per Stanford HAI’s 2026 AI Index findings on reasoning model progress, the rate of progress on mathematical benchmarks accelerated through 2025 and into 2026, with frontier reasoning models clearing thresholds the previous generation could not approach. If the trajectory continues, the unit-distance disproof becomes the first verified entry in a category that grows by tens of results per year. Pharmaceutical research, materials science, theoretical physics, and structural biology all contain backlogs of clean-enough open problems that a sufficiently capable general reasoning model could productively attempt. Per FierceBiotech’s coverage of OpenAI’s biotech-specific model, the infrastructure to attempt those problems is already being assembled at the major labs. The optimistic branch is therefore not science fiction. It is a roadmap with one validated stop on it.
The pessimistic branch is that this result is a high-water mark, not a starting line. Per StartupHub.ai’s coverage of the broader pattern of AI research claims, the field has seen the same pattern with image generation, with code generation, and with reasoning benchmarks: a dramatic early result followed by a long plateau as the system runs out of clean problems and starts running into messy ones. The unit distance problem was tractable because it was small, well-defined, and recently re-popularized by the Erdős problems site. Many problems that are interesting to humans are interesting precisely because they are messy and underspecified. If the model’s success was partly because the problem was unusually amenable, the next dozen attempts may not produce comparable wins. The pessimistic branch is also a coherent reading of the result. The honest answer is that no one knows yet which branch we are on, and the next six months of attempted replications across other open problems will tell us.
The financial implications for OpenAI specifically are non-trivial. Per CNBC’s coverage of OpenAI’s funding round, the company’s $852 billion valuation rests on a thesis that includes both enterprise revenue growth and scientific capability. The math result feeds the second pillar directly. Per Yahoo Finance’s reporting on the round’s structure, $35 billion of Amazon’s participation is contingent on either an IPO or attainment of AGI. The unit-distance disproof is not AGI, but it is the kind of qualitative milestone that influences how AGI-related contingencies will be adjudicated by partners and analysts. Public confidence that frontier reasoning models can produce verified original contributions to mathematics is the marketing material for the IPO road show. The S-1 filing this week and the math result last week are not coincidence; they are the same communications strategy.
The implications for the AI math community are more direct. Per coverage on the institutional response by the Combinatorics and More blog, several major math departments are now formalizing protocols for evaluating AI-generated proofs. The protocols will need to address verification cost (proofs are easy to generate, hard to check), authorship (who counts as a co-author of a model-generated proof), and reproducibility (the model’s process is not transparent in the way a human’s notebook is). Per the arXiv companion paper on the verification methodology, the nine-mathematician approach worked for this proof but will not scale to a torrent of submissions. Some combination of automated verification (Lean, Coq) and lighter human review will become standard. The professional infrastructure that surrounds peer-reviewed mathematics is about to undergo more change in the next two years than it has in the previous fifty.
Operators in adjacent industries should be doing the following, in roughly this order:
- Track replication attempts: the next 90 days of public AI math claims will tell us whether the May 20 result is the first of many or a one-off. Watch the Erdős Problems site and Quanta Magazine for verified entries rather than vendor press releases, because the verification chain is what made this announcement durable and what will distinguish future credible claims.
- Audit your research workflows for clean-problem candidates: if your organization has well-defined, low-dimensional open problems sitting on a shelf because they are too small to justify a dedicated headcount, those are the candidates a reasoning model is likeliest to make progress on. Sketch a one-page brief on each. The cost of the experiment is low; the upside is asymmetric.
- Establish AI co-authorship and verification policy now, not later: any organization that produces scientific or technical output should write down its position on AI-generated contributions before it has to. Per Scientific American’s discussion of the authorship problem, the journals do not have stable norms yet. The labs and corporates that establish their own first will set the de facto convention for their domains.
- Budget for verification, not just generation: the lesson of the October 2025 GPT-5 retraction is that AI-claimed results are cheap to produce and expensive to verify. Any pipeline that treats model output as the deliverable rather than the input to verification will produce the next retraction. Build the human-in-the-loop verification capacity before you need it.
- Separate the math result from the IPO narrative: the unit-distance disproof is a real scientific contribution. It is also a piece of the OpenAI IPO communications strategy. Both can be true at once. Operators evaluating investment, partnership, or procurement decisions should consume the underlying mathematical reality, not the press cycle wrapped around it. The reality is impressive enough on its own.
In other news
Anthropic projects first profitable quarter on $10.9B Q2 revenue — Anthropic is on track to post its first operating profit this quarter, with Q2 2026 revenue projected at $10.9 billion and an expected operating profit of roughly $559 million — more than double the $4.8 billion the company recorded in Q1. The projection arrives two years ahead of internal targets and reframes Anthropic from cash-burn frontier lab to live IPO candidate (CNBC).
Anthropic commits $1.25B/month to xAI for compute through 2029 — Anthropic agreed to pay xAI roughly $1.25 billion per month for Colossus supercomputer access through May 2029, with a discounted ramp-up rate for the first two months. The deal could deliver xAI more than $40 billion in total revenue and is the largest single compute commitment between two AI labs on record (TechCrunch).
OpenAI files confidential S-1 with SEC — OpenAI filed a confidential S-1 with the SEC ahead of a targeted September 2026 IPO at a valuation range of $852 billion to $1 trillion, which at the upper bound would be the largest technology public offering in history. CFO Sarah Friar has confirmed retail-investor share allocations as part of the structure (OpenTools).
Google’s TurboQuant cuts LLM inference memory by 6x at ICLR 2026 — Google Research presented TurboQuant at ICLR 2026, a training-free, model-agnostic compression algorithm that shrinks the KV cache to 3 bits per element — a 6x memory reduction — with zero measurable accuracy degradation and up to 8x attention speedup. The biggest single-shot inference-cost improvement in eighteen months (Medium).
Meta begins 8,000-employee AI restructuring layoffs — Meta started implementing approximately 8,000 layoffs — about 10% of its workforce — as part of an AI-driven restructuring aimed at consolidating spend into the company’s $125–145 billion 2026 AI capex plan, while redirecting roughly 7,000 surviving workers into newly created AI-focused teams. The cuts hit non-AI engineering and content moderation hardest, consistent with the pattern of incumbents redirecting headcount budget into compute (Quartz).
David Silver’s Ineffable Intelligence closes record $1.1B seed — Former DeepMind reinforcement-learning lead David Silver’s new lab Ineffable Intelligence broke out of stealth with a $1.1 billion seed round at a $5.1 billion valuation — the largest seed round in European history. Co-led by Sequoia and Lightspeed with participation from Nvidia, DST Global, Index, Google, and the U.K. Sovereign AI Fund. Silver has pledged his personal proceeds to high-impact charities (TechCrunch).