// frontier ml. production ai. essays every Tuesday.
▸YOMXXX
Deep technical writing on LLMs, Agents, RAG, and the systems beneath. New essay every Tuesday.
★ THIS WEEK · DEEP DIVE
Claude Opus 4.6 vs GPT-5.4 vs Gemini 3.1 Pro: 2026 春季 LLM 实测横评
深度对比 2026 年三大前沿 LLM 的基准测试成绩、实际编码表现、定价与适用场景。基于 SWE-bench、HumanEval、MMLU-Pro 等基准以及真实项目重写实测。
editor-picks.json › Editor's Picks