Cadmus Beats GPT-5 with $200 — 3 Implications

3 Takeaways from a Small AI That Beat GPT-5 for $200

Cadmus is a small-scale program synthesis system that can be trained for under $200.
It surpassed GPT-5 (95%) with 100% accuracy in integer arithmetic.
It proved that controlled AI research is possible without large models.

The Potential of Small-Scale AI Shown by Cadmus

An interesting paper was published on arXiv on February 9th. An AI trained for under $200 beat GPT-5 in a specific task.^[arXiv] This is Cadmus, a system presented by Russ Webb and Jason Ramapuram.

Cadmus consists of three things: an integer-based virtual machine, a real program dataset, and a transformer model. All of this can be trained with computing resources costing less than $200.^{[Cadmus Paper]}

Accuracy That Surpassed GPT-5, and Its Context

Cadmus recorded 100% accuracy in integer arithmetic tasks. GPT-5 only achieved 95% in the same tasks.^{[arXiv Paper]} Don’t misunderstand. This doesn’t mean Cadmus is generally superior to GPT-5.

It means that a small-scale model designed for a specific purpose can beat a general-purpose large model. The researchers pointed out that GPT-5 draws on unknown prior knowledge during the inference process. This is a limitation because the relationship between training data and performance cannot be analyzed transparently.

The Barrier to Entry for AI Research is Lowering

The implications of this research are clear. AI research doesn’t necessarily require infrastructure costing millions of dollars. Core topics such as program completion, out-of-distribution behavior, and reasoning ability can be studied with small-scale systems like Cadmus.

You can completely control the training data and transparently see inside the model. This is impossible with large models. This opens doors for university labs and individual researchers as well. Hope this helps.

Frequently Asked Questions (FAQ)

Q: Is Cadmus generally superior to GPT-5?

A: No. Cadmus only surpassed GPT-5 in the specific task of integer arithmetic. It is not appropriate to directly compare it to a general-purpose language model. The key is that a small-scale model designed for a specific purpose can beat a large model in a specific area. Cadmus’s strength lies in research transparency rather than performance.

Q: What exactly is program synthesis?

A: Program synthesis is a technology where AI automatically generates code based on given conditions or examples. You can think of it as the underlying technology for code auto-completion or code generation tools. Cadmus is a system that reproduces this process on a small scale, allowing researchers to transparently analyze internal operations.

Q: Can anyone reproduce this experiment for $200?

A: According to the paper, training Cadmus’s transformer model requires computing resources costing less than $200. With cloud GPUs, graduate students or individual researchers can reproduce it sufficiently. However, related knowledge is required to understand the entire system, such as virtual machine design and dataset construction.

If you found this article useful, please subscribe to AI Digester.

References

A Small-Scale System for Autoregressive Program Synthesis Enabling Controlled Experimentation – arXiv (2026-02-09)
GPT-5 Model Overview – OpenAI (2025)
Program Synthesis – Wikipedia (2026)