Cadmus Beats GPT-5 with $200 — 3 Implications

3 Takeaways from a Small AI That Beat GPT-5 for $200

  • Cadmus is a small-scale program synthesis system that can be trained for under $200.
  • It surpassed GPT-5 (95%) with 100% accuracy in integer arithmetic.
  • It proved that controlled AI research is possible without large models.

The Potential of Small-Scale AI Shown by Cadmus

An interesting paper was published on arXiv on February 9th. An AI trained for under $200 beat GPT-5 in a specific task.[arXiv] This is Cadmus, a system presented by Russ Webb and Jason Ramapuram.

Cadmus consists of three things: an integer-based virtual machine, a real program dataset, and a transformer model. All of this can be trained with computing resources costing less than $200.[Cadmus Paper]

Accuracy That Surpassed GPT-5, and Its Context

Cadmus recorded 100% accuracy in integer arithmetic tasks. GPT-5 only achieved 95% in the same tasks.[arXiv Paper] Don’t misunderstand. This doesn’t mean Cadmus is generally superior to GPT-5.

It means that a small-scale model designed for a specific purpose can beat a general-purpose large model. The researchers pointed out that GPT-5 draws on unknown prior knowledge during the inference process. This is a limitation because the relationship between training data and performance cannot be analyzed transparently.

The Barrier to Entry for AI Research is Lowering

The implications of this research are clear. AI research doesn’t necessarily require infrastructure costing millions of dollars. Core topics such as program completion, out-of-distribution behavior, and reasoning ability can be studied with small-scale systems like Cadmus.

You can completely control the training data and transparently see inside the model. This is impossible with large models. This opens doors for university labs and individual researchers as well. Hope this helps.

Frequently Asked Questions (FAQ)

Q: Is Cadmus generally superior to GPT-5?

A: No. Cadmus only surpassed GPT-5 in the specific task of integer arithmetic. It is not appropriate to directly compare it to a general-purpose language model. The key is that a small-scale model designed for a specific purpose can beat a large model in a specific area. Cadmus’s strength lies in research transparency rather than performance.

Q: What exactly is program synthesis?

A: Program synthesis is a technology where AI automatically generates code based on given conditions or examples. You can think of it as the underlying technology for code auto-completion or code generation tools. Cadmus is a system that reproduces this process on a small scale, allowing researchers to transparently analyze internal operations.

Q: Can anyone reproduce this experiment for $200?

A: According to the paper, training Cadmus’s transformer model requires computing resources costing less than $200. With cloud GPUs, graduate students or individual researchers can reproduce it sufficiently. However, related knowledge is required to understand the entire system, such as virtual machine design and dataset construction.


If you found this article useful, please subscribe to AI Digester.

References

PAN 2026, Summary of 5 AI Text Detection Tasks [2026]

How to Detect AI-Generated Text — Key Takeaways from PAN 2026

  • PAN 2026 announced 5 tasks related to AI-generated text detection.
  • Two new tasks were added: text watermarking and inference trajectory detection.
  • It’s an academic benchmark with over 1,100 submissions since 2012.

5 Tasks Covered by PAN 2026

PAN is a workshop dealing with text forensics. This year, it presented 5 tasks.[arXiv]

The first is Vojt-Kamp AI Detection. It distinguishes between AI-written and human-written text. Detection must be possible even in obfuscated situations.

The second, newly established task is Text Watermarking. It involves embedding invisible markers in AI text and verifying their resistance to attacks.[PAN 2026]

From Author Analysis to Inference Trajectory

The third is Multi-Author Style Analysis. It finds the points where the author changes within a document.

The fourth is Generative Plagiarism Detection. It reverse-traces the original source from text created by AI referencing that source.

The fifth, newly established task is Inference Trajectory Detection. It identifies the source of the LLM’s reasoning process and detects safety issues.[arXiv]

How to Participate and Outlook

Submit your model as a Docker container, and it will be automatically evaluated on the TIRA platform.[PAN]

As AI-generated content surges, the importance of detection technology is also growing. I hope this is helpful for educational institutions and the media industry as well.

Frequently Asked Questions (FAQ)

Q: Can anyone participate in PAN 2026?

A: Yes, not only academic researchers but also industry professionals can participate. Submit your model as a Docker container, and it will be automatically evaluated on TIRA. You only need to register for the CLEF conference, and team participation is also possible.

Q: What is obfuscation in the Vojt-Kamp task?

A: It’s a technique to disguise AI-generated text as if it were written by a human. This includes paraphrasing, style transformation, and word substitution. PAN 2026 requires models that can detect even these types of texts.

Q: What is the principle behind text watermarking?

A: It’s a technology that inserts statistically detectable patterns when AI generates text. It’s invisible to the human eye but detectable by algorithms. Both insertion accuracy and attack robustness are evaluated.


If you found this article helpful, please subscribe to AI Digester.

References

OpenAI Fires Executive Opposed to Adult Mode — The Real Reason is [2026]

OpenAI Policy VP Fired: 3 Key Issues Summarized

  • OpenAI fired its VP of Policy, Ryan Beiermeister, in January.
  • The reason given was allegations of gender discrimination against a male employee.
  • She was the one who opposed the launch of ChatGPT’s adult mode.

Executive Who Opposed ChatGPT Adult Mode Fired

According to the WSJ, OpenAI fired VP Ryan Beiermeister in January 2026[WSJ/Techmeme]. The reason cited was gender discrimination against a male employee. She denies the allegations[TechCrunch].

Since she was the one who opposed the launch of ChatGPT’s adult mode, questions are being raised about the real reason for her dismissal.

What is Adult Mode?

It’s a feature that allows sexually explicit conversations for age-verified adults. Sam Altman unveiled the concept in October 2025[Decrypt], and CPO Fidji Simo confirmed its release in Q1 2026.

OpenAI has shifted to the position that “AI shouldn’t be the morality police.” It seems that xAI Grok’s allowance of adult content has acted as competitive pressure[Cybernews].

What the Firing Controversy Suggests

OpenAI stated that “the termination is unrelated to the issues she raised.” However, a pattern is emerging where employees who raise safety concerns are ultimately pushed out.

The role of policy officers in AI companies is becoming more difficult. The balance between monetization pressure and safety is becoming increasingly complex.

Frequently Asked Questions (FAQ)

Q: When will OpenAI’s adult mode be released?

A: CPO Fidji Simo confirmed its release in Q1 2026. They are implementing an age verification system and it will only be available to those 18 and older. The specific date is undisclosed, but age verification feature testing is underway.

Q: Who is Ryan Beiermeister?

A: She served as VP of Product Policy at OpenAI from June 2024. She was responsible for content management and safety policies. She was fired in January 2026 on allegations of gender discrimination, which she denies.

Q: Will this incident affect the release of adult mode?

A: It is unlikely to directly affect the release schedule. OpenAI has already officially confirmed the release. However, external scrutiny of internal policy decisions may be intensified.


If you found this helpful, please subscribe to AI Digester.

References

Claude Code v2.1.39, 5 Fixes Including Terminal Performance Improvements [2026]

Claude Code v2.1.39 Update: Top 5 Highlights

  • Improved terminal rendering performance for faster screen output
  • Fixed a bug where critical errors were being swallowed
  • Resolved an issue where processes would freeze after session termination

5 Issues Anthropic Fixed

Anthropic released Claude Code v2.1.39 on February 10, 2026. This patch focuses on stability and terminal rendering without introducing any new features.[GitHub]

The biggest change is the improved terminal rendering performance. Since Claude Code is a terminal-based AI coding tool, screen output speed directly impacts perceived performance.[Release Notes]

Detailed Error Handling and Stability Fixes

A problem where critical errors were being swallowed without being displayed to the user has been fixed. Previously, even if a serious error occurred, it wouldn’t appear on the screen, making debugging difficult.

The issue of processes freezing after session termination has also been resolved. Bugs related to character truncation at terminal boundaries and blank lines in detailed transcripts were also fixed.[GitHub]

Implications for the AI Coding Tool Market

Claude Code is Anthropic’s terminal-based AI coding assistant. It competes with Copilot and Cursor, differentiating itself with its terminal environment.[Anthropic Docs]

While it lacks flashy new features, this update refines the stability that is most important in developer tools. You can easily update via npm or brew. Community feedback has been positive.[GitHub]

Frequently Asked Questions (FAQ)

Q: How do I update to Claude Code v2.1.39?

A: If you’re using npm, you can update with the command npm update -g @anthropic-ai/claude-code. If you’re using Homebrew on macOS, you can also use brew upgrade claude-code. After installation, you can check the version with claude –version.

Q: Does this update include new AI features?

A: No. v2.1.39 is a patch release focused on bug fixes and performance improvements without any new features. It includes 5 stability-focused changes, such as improved terminal rendering performance, error display fixes, and process freeze resolution.

Q: Is Claude Code free to use?

A: Claude Code requires an Anthropic API key and is billed based on usage. Claude Pro or Max subscribers can also use it with their subscription allowance. It can be run directly from the terminal without a separate IDE extension.


If you found this helpful, please subscribe to AI Digester.

References

Amazon Prepares AI Content Marketplace [2026]

Amazon AI Content Marketplace: 3 Key Takeaways

  • Amazon is preparing a marketplace to sell news organization content to AI companies.
  • It’s the second big tech content licensing platform, following Microsoft.
  • It will be a new revenue stream for news organizations and a legitimate learning data channel for AI companies.

Amazon’s Content Trading Platform Vision

Amazon is planning a marketplace to mediate content licensing between news organizations and AI companies. The Information reported it, and AWS conference-related slides were distributed to the publishing industry.[TechCrunch]

This marketplace was introduced alongside Amazon Bedrock and QuickSight. Amazon already operates individual contracts paying the New York Times over $20 million annually.[WinBuzzer]

Big Tech Content Licensing Competition

Amazon is jumping in after Microsoft announced its Publisher Content Marketplace last week. This comes amid news organizations’ backlash against the unauthorized use of content by generative AI.[PYMNTS]

Whether content licensing will become an industry standard remains to be seen. Hope this helps!

Frequently Asked Questions (FAQ)

Q: When will the Amazon AI Content Marketplace launch?

A: The official launch date is undecided. Slides have been distributed ahead of the AWS conference, and discussions are underway with publishing industry executives.

Q: What are the differences from the Microsoft platform?

A: Both have a structure for licensing news organization content to AI companies. Amazon is likely to be integrated with AWS Bedrock, while Microsoft is Azure-centric.

Q: How much revenue will news organizations earn?

A: The pricing structure is undisclosed. Amazon has an individual contract with the New York Times for over $20 million annually, so a similar scale is expected.


If you found this helpful, please subscribe to AI Digester.

References

3 Issues with the Emergence of AI Music in Olympic Ice Dancing

3 Key Issues with AI Music at the Olympics Ice Dance

  • A Czech brother-sister duo used AI-generated music for their rhythm dance at the Milan Olympics.
  • Plagiarism issues arose from the AI-created song, resembling a 90s hit.
  • The ISU allows AI music, but debates over artistry are intensifying.

Czech Siblings Take the Olympic Stage with AI Music

In the rhythm dance of the 2026 Milan-Cortina Winter Olympics ice dancing competition, the Czech siblings Mrazkova and Mrazek performed to AI-generated music.[TechCrunch] The rhythm dance theme was 90s music, and while other competitors chose Jennifer Lopez or the Backstreet Boys, they opted for an AC/DC-style AI track.

Plagiarism Erupts from AI Song

In the initial version, lyrics from New Radicals’ 1998 hit “You Get What You Give” were used almost verbatim.[Yahoo Sports] The lyrics were later modified, but traces of the original song remained in the guitar riffs and elsewhere.

This duo chose AI after experiencing copyright issues last season, creating the irony of AI committing plagiarism.[BraveWords]

The Boundary Between Artistry and Technology

The ISU does not prohibit AI music. However, criticism has poured in, stating that “90% of this sport is artistry and creativity.”[Newsweek] The duo scored 72.09 points (40.50 for technical elements, 31.59 for composition), placing them 17th. They have not revealed whether they will continue to use AI music.

Frequently Asked Questions (FAQ)

Q: Is the use of AI music allowed in the Olympics?

A: AI music is not prohibited under ISU regulations. However, copyrighted music would be a violation. This incident may trigger discussions about AI music guidelines. Regulations need to keep pace with technological advancements.

Q: Why does AI music plagiarize existing songs?

A: Music-generating AI learns from existing music data. The melodies, lyrics, and harmonic progressions of the training data can directly appear in the output. The more famous a song is, the greater its influence on the training, increasing the likelihood of plagiarism. This is a fundamental limitation of AI-generated content.

Q: How did the Czech duo perform?

A: They scored 72.09 points in the rhythm dance, placing 17th. This is a combination of 40.50 points for technical elements and 31.59 points for composition. While far from medal contention, it is said to be their personal best. The free dance remains, so the final ranking may change.


If you found this helpful, please subscribe to AI Digester.

References

ChatGPT Goes to 3 Million US Soldiers — Joining GenAI.mil [2026]

ChatGPT Rolls Out to 3 Million US Military Personnel — 3 Key Takeaways

  • OpenAI’s ChatGPT has been added to the US Department of Defense’s GenAI.mil platform.
  • Accessible to 3 million military personnel, with 1.1 million currently active users.
  • Joins Google Gemini and xAI Grok as the third AI model on the platform.

ChatGPT Joins GenAI.mil

The US Department of Defense has announced a partnership with OpenAI to add ChatGPT to GenAI.mil.[OpenAI] GenAI.mil is the Department of Defense’s dedicated generative AI platform, launched in December 2024 with Google Gemini. xAI Grok was added just before Christmas, and ChatGPT is the third to join.[Breaking Defense]

Currently, 1.1 million people are using it, and it will be expanded to 3 million personnel across the military.[DefenseScoop]

Security and Data Isolation

This ChatGPT is different from the commercial version. It runs in a government cloud, and data is isolated within the government environment. It is also not used for OpenAI model training.[OpenAI] Currently, it only handles non-secret sensitive data, and classified approval is in progress. It stems from a contract worth up to $200 million signed by the CDAO in June 2025.[Breaking Defense]

Prospects for AI Military Use

The growth rate of securing 1.1 million users in just two months since its launch is impressive. Secretary of Defense Haghesse is encouraging, “Use AI, go to GenAI.mil.” As the military use of generative AI becomes more widespread, discussions about accuracy and security are also expected to increase.

Frequently Asked Questions (FAQ)

Q: Who can use GenAI.mil?

A: Approximately 3 million military personnel, civil servants, and contractors affiliated with the US Department of Defense are eligible. Currently, 1.1 million are active users, and the entire military has designated it as the official AI platform. The general public cannot access it.

Q: How is ChatGPT on GenAI.mil different from the regular version?

A: It runs in a government-dedicated cloud, and data is isolated within the government environment. It is not used for training OpenAI’s public models. Currently, only non-secret sensitive data can be processed, and classified processing is pending approval.

Q: What AI models are available on GenAI.mil?

A: There are currently three. It started with Google Gemini, and xAI Grok was added. ChatGPT joined as the third in February 2026. All three models are government-customized, unlike commercial versions.


If you found this article helpful, please subscribe to AI Digester.

References

Flapping Airplanes Overturn AI Learning with ₩180 Billion Seed Funding

Flapping Airplanes, Overturns AI Learning with $180M Seed

  • Sequoia, GV, and Index invest $180 million
  • Focus on efficient learning methods instead of massive data input
  • 25-year-old founding team aims to be “the AGI lab for the younger generation”

$180 Million Bet on Data Efficiency

AI startup Flapping Airplanes has closed an $180 million seed round. Sequoia, GV, and Index Ventures invested.[TechCrunch]

The core argument is simple: current AI models are inefficient, and data efficiency is the real bottleneck. Humans learn to reason with very little data. They aim to apply this principle to AI.[Sequoia Capital]

Research Breakthroughs Instead of Scaling

Sequoia partner David Cahn compared two paths: “Growing LLMs with total resource mobilization” vs. “Needing 2-3 more research breakthroughs to reach AGI.” Flapping Airplanes chose the latter, aiming to reset the efficiency curve with a 5-10 year horizon.[TechCrunch]

Their slogan, “The brain is the floor, not the ceiling, for AI,” is key. Biological learning is the minimum baseline, not a limitation.

A Lab Led by 26-Year-Old Founders

Ben Spector (Prod founder), Asher Spector (Stanford PhD), and Aidan Smith (formerly of Neuralink) co-founded the company.[Sequoia Capital] The company name is a paradox. Airplanes don’t flap their wings like birds. The idea is to understand the principles, not just copy nature.[Index Ventures]

While large AI companies focus on commercialization, a lab dedicated to long-term research has emerged. Hope this is helpful.

Frequently Asked Questions (FAQ)

Q: What is Flapping Airplanes?

A: It’s an AI research lab that received $180 million in funding from Sequoia and others. They focus on efficient, biologically-inspired learning instead of large-scale data training, recruiting unconventional talent and concentrating on long-term research.

Q: What does “The brain is the floor, not the ceiling” mean?

A: Current AI uses more data than humans but lacks reasoning abilities. They aim to surpass human-level learning efficiency, treating it as a minimum baseline.

Q: How is it different from existing AI labs?

A: Most AI companies use scaling strategies, increasing computing power and data. This company prioritizes fundamental research, aiming to improve efficiency itself with a 5-10 year outlook.


If you found this helpful, please subscribe to AI Digester.

References

Boston Dynamics CEO Replacement — 30-Year Veteran’s Retirement

Boston Dynamics CEO Change — 30-Year Veteran Retires

  • Robert Playter CEO to step down, effective February 27th
  • CFO Amanda McMaster appointed as interim CEO
  • Ends 6-year term leading the commercialization of Spot, Atlas, and Stretch

Playter’s 30 Years, The Final Chapter

Boston Dynamics CEO Robert Playter is retiring after 30 years with the company.[TechCrunch] His last day is February 27th. He originally served as COO before being promoted to CEO in the fall of 2019.[The Robot Report]

CFO Amanda McMaster will take over as interim CEO. She will lead the company until the board finds a permanent CEO.[The Robot Report]

Achievements of a 6-Year Term

Playter’s tenure marks a period of transition for Boston Dynamics from a research lab to a commercial enterprise. He spearheaded the full-scale commercial sales of the quadruped robot Spot and entered warehouse automation with the logistics robot Stretch.

He also maintained leadership by redesigning Atlas with electric drive. He strengthened strategic cooperation with Hyundai Motor Group and deepened AI partnerships with Google DeepMind.[TechCrunch]

Significance of the Leadership Change in the Robotics Industry

The timing of the change is significant. The humanoid robot competition is heating up as startups like Skild AI and Figure attract large-scale investments. For the new CEO, a commercialization strategy combined with Hyundai Motor Group’s manufacturing capabilities will be a key challenge.

A leader is needed to take the next step on the commercialization foundation laid by Playter. It’s worth watching who will take the helm in this transitional period for the robotics industry.

Frequently Asked Questions (FAQ)

Q: Why is Robert Playter stepping down?

A: Officially, the reason is retirement. He served as CEO for 6 years out of his 30 years with the company. It appears to be a natural generational shift rather than a management conflict. His last day is February 27th.

Q: Who is Amanda McMaster?

A: She is the CFO of Boston Dynamics. She has been appointed as interim CEO and will lead the company until the board finds a permanent successor. As a finance expert, a focus on profitability is expected.

Q: What are Boston Dynamics’ main products?

A: Three robots are key. Spot is a quadruped robot for industrial site inspection. Stretch is a warehouse logistics automation robot. Atlas is an electrically driven humanoid robot aimed at becoming a next-generation general-purpose robot.


If you found this article helpful, please subscribe to AI Digester.

References

ChatGPT Ads: Do They Really Not Affect Answers? [Policy Analysis]

ChatGPT Ad Testing: 3 Key Takeaways

  • OpenAI has started showing ads on its Free and Go plans.
  • Ads are separate from responses, and conversation content remains private from advertisers.
  • Paid subscribers (Plus and above) will not see ads.

Specifics of OpenAI’s Official Ad Policy

OpenAI began testing ChatGPT ads for adult users in the US starting February 9, 2026.[OpenAI] This applies to users on the Free and Go (US$8/month) plans. Ads appear below the ChatGPT response with a “Sponsored” label.

Ad matching is based on conversation topics, chat history, and previous ad interactions.[Search Engine Journal] For example, asking about travel plans might trigger ads for meal kits or grocery delivery.

User Protections and Limitations

OpenAI states that ads will not influence ChatGPT’s responses. Sam Altman himself said, “We won’t take money to influence the answers.”[NBC News]

User controls are also in place. You can dismiss ads, disable personalization, or delete ad-related data. Ads will not appear in conversations about health, mental health, or politics. Accounts belonging to users under 18 will also not see ads.

Advertisers only receive aggregated data like impressions and clicks. They do not have access to user conversations or personal information.

Altman’s Change of Stance and Industry Reactions

Sam Altman stated in October 2024 that “the combination of AI and ads feels uniquely bad.” However, he shifted his position towards accepting contextual placement by November 2025.[Search Engine Journal]

Meanwhile, Anthropic recently highlighted Claude’s ad-free experience in a Super Bowl ad. Altman countered this as “obviously disingenuous,” arguing that ad revenue offsets the cost of providing AI to free users.

Ad pricing is also noteworthy. The initial minimum commitment is reportedly $200,000, with a CPM (cost per 1000 impressions) of around $60. This targets a premium market.

Frequently Asked Questions (FAQ)

Q: Will ChatGPT paid subscribers also see ads?

A: No. Plus, Pro, Business, Enterprise, and Education subscribers will not see ads. Ads are only displayed to Free and Go plan users. Upgrading to a paid plan allows you to use ChatGPT without ads.

Q: Do ads affect the accuracy of ChatGPT responses?

A: OpenAI officially states that ads do not influence responses. Ads are displayed separately below the response and are visually distinct from the organic answer. The structure does not allow advertisers to manipulate or change the content of the responses.

Q: Can I turn off ChatGPT ads?

A: To completely remove them, you need to upgrade to a paid plan (Plus or higher). Free and Go users can dismiss individual ads, disable personalized ads, or delete ad-related data. However, there is no option to completely turn off ads themselves.


If you found this helpful, please subscribe to AI Digester.

References