Gemini 3, #1 in AI Chess: Game Arena Expands to Poker and Werewolf

Gemini 3 takes #1 on the Game Arena chess leaderboard
Poker and Werewolf newly added
AI poker tournament results to be released on February 4

What Happened?

Google DeepMind expanded Kaggle Game Arena. Gemini 3 claimed #1 in chess, and Poker and Werewolf were added.^{[Google Blog]}

In the first tournament in August 2025, o3 crushed Grok 4 4-0.^[Chess.com] This time, Gemini 3 took the crown.

Poker is a heads-up no-limit hold’em format. Werewolf is the first team-based natural language game, where AI must persuade and deceive through conversation alone.^{[Google Blog]}

Why Does This Matter?

Honestly, this is not just a simple game tournament. It’s an attempt to break through the saturation problem of static benchmarks using games.^[Digit]

Personally, I find Werewolf the most meaningful. Communication and negotiation are core capabilities for AI agents.

Gemini 3 ranking #1 in chess is also notable. Win rates increase with longer inference time, and Gemini 3 Pro is at the top alongside GPT-5.^[EPAM]

What’s Next?

After the poker results are released on February 4, risk management ability rankings will emerge.

But there’s a challenge. In the 2025 tournament, several AIs were disqualified for illegal moves.^[Chess.com] The rule compliance problem persists.

Frequently Asked Questions (FAQ)

Q: Do AI compete against dedicated chess engines?

A: No. Game Arena only has general-purpose LLMs competing against each other. Dedicated engines like Stockfish are not eligible to participate. The purpose is to measure strategic reasoning ability of general AI. In the 2025 tournament, only 8 general-purpose models including GPT, Gemini, Claude, and Grok participated. ELO comparisons with chess engines are meaningless.

Q: Do AI actually lie in Werewolf?

A: Yes. Werewolf is a social deduction game where you must deceive opponents depending on your role. AI reasons and deceives through natural language conversation alone. It’s effective for Theory of Mind tests, and directly relates to agent negotiation and user intent understanding in enterprise environments.

Q: Can regular people participate?

A: Yes. It’s a Kaggle-based open platform with code publicly available on GitHub. Anyone can create and submit an agent. Individual developers, not just large research labs, can benchmark their models on the public leaderboard. The key is the low barrier to entry.

If you found this article useful, please subscribe to AI Digester.

References

Game Arena: Poker and Werewolf, and Gemini 3 tops chess – Google Blog (2026-02-02)
OpenAI’s o3 Crushes Grok 4 In Final – Chess.com (2025-08-07)
Google DeepMind Game Arena – GitHub
How to Choose AI Models: LLM Chess Benchmark – EPAM (2026-01-15)