#llm

7 posts

GLM-5 vs Kimi K2.5 vs MiniMax M2.5: a practical coding showdown

Three Chinese MoE models claim frontier-class coding at a fraction of Opus pricing. Here's how they actually perform in Claude Code, Cursor, and real developer workflows.

18 Mar 2026 · 11 min read

ai llm developer-tools

Nemotron 3 Super: NVIDIA's 120B open-weight model built for agentic workloads

NVIDIA's Nemotron 3 Super packs 120B parameters into 12B active, combining Mamba-2, Transformers, and a novel LatentMoE — all open-weight and purpose-built for multi-agent systems.

12 Mar 2026 · 8 min read

ai machine-learning llm

Kimi K2.5 vs Claude Opus 4.6: a practical comparison for developers

Kimi K2.5 delivers 95% of Opus 4.6's coding capability at 10-25× lower cost. But the benchmarks don't tell the whole story.

2 Mar 2026 · 8 min read

ai llm developer-tools

Gemini 3.1 Pro: Google's reasoning leap

Google's latest frontier model more than doubles its predecessor's reasoning score in three months, leads 13 of 16 benchmarks, and ships at the same price. The adaptive compute architecture is the interesting part.

19 Feb 2026 · 6 min read

machine-learning llm ai

Claude Sonnet 4.6: Opus-level performance at a fifth of the price

Anthropic's second major launch in two weeks puts near-flagship capability at $3/$15 per million tokens. The mid-tier label is starting to feel like a misnomer.

17 Feb 2026 · 9 min read

machine-learning llm ai

MiniMax M2.5: the agent-native model that costs a dollar an hour

A 230B MoE model with 10B active parameters hits 80.2% on SWE-Bench Verified at 1/20th the cost of Opus. Here's what's real and what's hype.

12 Feb 2026 · 10 min read

machine-learning llm open-source

GLM-5: the open-weight model that's rewriting the frontier economics

Zhipu AI's 744B mixture-of-experts model ships under MIT license with frontier-class benchmarks and aggressive pricing. Here's what actually matters.

11 Feb 2026 · 8 min read

machine-learning llm open-source