GLM-5 vs Kimi K2.5 vs MiniMax M2.5: a practical coding showdown
Three Chinese MoE models claim frontier-class coding at a fraction of Opus pricing. Here's how they actually perform in Claude Code, Cursor, and real developer workflows.
7 posts
Three Chinese MoE models claim frontier-class coding at a fraction of Opus pricing. Here's how they actually perform in Claude Code, Cursor, and real developer workflows.
NVIDIA's Nemotron 3 Super packs 120B parameters into 12B active, combining Mamba-2, Transformers, and a novel LatentMoE — all open-weight and purpose-built for multi-agent systems.
Kimi K2.5 delivers 95% of Opus 4.6's coding capability at 10-25× lower cost. But the benchmarks don't tell the whole story.
Google's latest frontier model more than doubles its predecessor's reasoning score in three months, leads 13 of 16 benchmarks, and ships at the same price. The adaptive compute architecture is the interesting part.
Anthropic's second major launch in two weeks puts near-flagship capability at $3/$15 per million tokens. The mid-tier label is starting to feel like a misnomer.
A 230B MoE model with 10B active parameters hits 80.2% on SWE-Bench Verified at 1/20th the cost of Opus. Here's what's real and what's hype.
Zhipu AI's 744B mixture-of-experts model ships under MIT license with frontier-class benchmarks and aggressive pricing. Here's what actually matters.