// frontier ml. production ai. essays every Tuesday.

YOMXXX

Deep technical writing on LLMs, Agents, RAG, and the systems beneath. New essay every Tuesday.

★ THIS WEEK · DEEP DIVE

Claude Opus 4.6 vs GPT-5.4 vs Gemini 3.1 Pro: 2026 春季 LLM 实测横评

深度对比 2026 年三大前沿 LLM 的基准测试成绩、实际编码表现、定价与适用场景。基于 SWE-bench、HumanEval、MMLU-Pro 等基准以及真实项目重写实测。

Paper · 11 min read · 05/09/26
editor-picks.json Editor's Picks
// workshop.md 18 essays view all ›
// long-form.md 13 essays view all ›
// paper.md 11 essays view all ›
// tools.md 11 essays view all ›
// weekly.md 2 essays view all ›

// subscribe to yomxxx weekly

1 deep essay + 1 weekly digest, every Tuesday. No spam, ever.

// subscriptions open soon