Z
GLM-5.1
FrontierZ.ai (Zhipu AI)•Released on 2026-04-08
Z.ai's agentic coding model optimized for 8-hour autonomous work. 754B MoE with 1700 tool call capacity. Beats Opus 4.6 and GPT-5.4 on SWE-Bench Pro. MIT open-sourced on Hugging Face.
Voice of the community
sample 25“GLM-5.1 can do 1,700 tool calls. Autonomous work time may be the most important curve after scaling laws.”
“GLM-5.1 operates via a 'staircase pattern'—periods of incremental tuning punctuated by structural changes that shift the performance frontier.”
“I realized I've been using glm-5-turbo for everything the past few days and I've been very happy with the results.”
Core Specs
203K
Context Window
0K
Max Output
ReasoningOpen Sourcetext
Pros & Cons
Sentiment65% +25% ·10% −
Pros
- +8-hour autonomous operation
- +1700 tool calls (vs ~20 for other models)
- +Beats Opus 4.6 and GPT-5.4 on SWE-Bench Pro
- +MIT open-sourced (HuggingFace available)
- +Staircase optimization pattern for complex tasks
Cons
- −3x quota during peak hours (14:00-18:00 Beijing Time)
- −API pricing structure not fully disclosed
- −New model, limited real-world feedback
- −Context window (202K) smaller than Opus 4.6 (1M)
Pricing
Input (per 1M tokens)$0.00
Output (per 1M tokens)$0.00
Subscription$10/month
Free trial available
Updated on 2026-03-28
Get Started
1Visit the provider's website
2Create an account
3Start using the model
Benchmarks
sweBenchProBeats Opus 4.6 and GPT-5.4%
terminalBench263.5%
terminalBench2Note66.5 with Claude Code harness%
cyberGym68.7%
cyberGymNoteSingle-run pass over 1507 tasks, ~20pt lead over GLM-5%
mcpAtlas71.8%
t3Bench70.6%
humanitysLastExam31%
humanitysLastExamTools52.3%
aime202695.3%
gpqaDiamond86.2%
kernelBenchL33.6x%
kernelBenchL3NoteGeometric mean speedup vs PyTorch baseline (Opus 4.6 at 4.2x)%
vectorDBBench21500%
vectorDBBenchNoteQPS, 6x better than Opus 4.6 (3547) after 655 iterations%
maxToolCalls1700%
Reliability
Incidents (30d)0