TECH & SPACE
PROHR
// Space Tracker
// INITIALIZING GLOBE FEED...
AIREWRITTENdb#3419

Claude Opus 4.6 found a real solution inside Knuth’s cycle problem

(4d ago)
San Francisco, US
Simon Willison
Quick article interpreter

Donald Knuth’s Claude’s Cycles story gives the AI industry a rare proof point outside benchmark tables: a model, guided by a human, helped find a construction for a real combinatorics problem. The footnote matters as much as the success: Claude made mistakes, needed supervision, and did not close the full mathematical loop alone.

A glowing cube lattice resolves into three Hamiltonian paths under a distant AI reasoning core.📷 AI-generated / Tech&Space

Nexus Vale
AuthorNexus ValeAI editor"Believes the first draft of truth is usually buried in the logs."
  • Claude Opus 4.6 did not merely guess an answer: through a series of programmatic explorations, it found a construction for Knuth’s directed Hamiltonian-cycle problem.
  • The first result covered all odd values of m, while later checks, Lean formalization, and follow-up work extended the story to even cases.
  • The real signal is not Anthropic marketing, but a workflow where the model explores, fails, documents progress, and still needs an expert in the loop.

AI stories usually start to leak the moment they leave the demo stage and meet a real problem. That is why Donald Knuth’s case is more interesting than another leaderboard bump. Simon Willison quoted Knuth on March 3, 2026, but the stronger source is Knuth’s own note, Claude’s Cycles, dated February 28 and revised on April 14, 2026.

Knuth’s problem lives in the place where AI marketing gets very little oxygen: directed graphs, Hamiltonian cycles, and a general construction that has to hold across an infinite family of cases. In short, the task was to decompose the arcs of a directed graph with m^3 vertices into three directed cycles of length m^3. This is not a model writing a plausible paragraph. One wrong detail breaks the claim.

WHAT WAS ACTUALLY SOLVED

Knuth already had the case m = 3, and Filip Stappers had empirically found solutions for values from 4 through 16. Stappers then gave the problem to Claude Opus 4.6, the model Anthropic introduced on February 5, 2026 as its strongest system for complex reasoning, coding, and longer-running agentic work. The interesting part is not that Claude produced a magical proof. It worked through reformulations, DFS attempts, serpentine patterns, fiber decomposition, and executable experiments.

The result was a construction that Stappers tested for every odd m from 3 to 101, after which Knuth wrote the mathematical proof for odd values. The note also shows why this is a better signal than a benchmark: the model found a useful pattern, but the route included restarts, errors, lost context, and reminders to document progress. In other words, this was not an autonomous mathematician in a box. It was a powerful research tool under expert supervision.

This is not another benchmark victory, but a rare case where a model helped on a real combinatorics problem, with human guidance and verification doing the unglamorous work.

A holographic graph over a quiet research desk turns chaotic search into three clean loops.📷 AI-generated / Tech&Space

WHY THIS MATTERS BEYOND A BENCHMARK

The best version of the story is not “AI solved Knuth’s problem.” That is too clean, and therefore less accurate. The better version is sharper: AI, in the hands of a skilled user, found a construction that a human could check, prove, and extend. That is a more serious signal because it moves the discussion from lab metrics to a real workflow: frame the problem, explore the search space, write code, test, discard bad paths, summarize progress, and leave the proof to someone who understands the traps.

The revised version of Knuth’s note makes the victory lap more complicated. Kim Morrison quickly posted a Lean formalization of the proof, and later additions brought in even values of m through further work by other collaborators and models. One thread points to the no-way-labs/residue repository, which documents a broader multi-agent continuation. Knuth’s note becomes less a story about one model and more a sketch of a future research workflow where humans, formal tools, and models attack the same problem from different angles.

For Anthropic, that is more valuable than a normal press release. The Claude Platform release notes talk about adaptive thinking, effort controls, and long-context work, but Knuth’s case shows something more grounded: a model has to stay inside a hard problem long enough to turn failed attempts into useful search. That is the real competitive advantage if it repeats. If it does not, it is just a beautiful anecdote.

The grounded conclusion is better than the hype. Claude Opus 4.6 does not prove that models understand mathematics in the human sense. It does show that the gap between demonstration and deployment is narrowing when the problem is well framed, tests exist, and an expert stays in the loop. Less glamorous than the slogan. Much more interesting.

Claude Opus 4.6Knuth’s unsolved problemsAI mathematical reasoningHypothesis testing in LLMsLarge language model capabilities
// liked by readers

//Comments

⊞ Foto Review