The landscape of artificial intelligence is shifting from models that simply “chat” to models that “work.” In a major move for the open-source community, Chinese AI startup Z.ai (also known as Zhupai AI) has released GLM-5.1, a massive 754-billion parameter Mixture-of-Experts (MoE) model designed for long-duration autonomous tasks.
Unlike previous generations of AI that often “drift” or lose focus after a few dozen steps, GLM-5.1 is engineered to operate independently for up to eight hours on a single complex objective. This marks a transition from “vibe coding”—where users prompt an AI and hope for the best—to agentic engineering, where the AI acts as a self-correcting, long-term worker.
Breaking the Performance Plateau: The “Staircase” Pattern
One of the primary hurdles in AI development is the “plateau effect.” Traditionally, when an AI agent is given more tools or more time to solve a problem, its performance eventually stalls or it begins to make errors through “strategy drift.”
Z.ai’s research suggests GLM-5.1 overcomes this through a “staircase pattern” of optimization. Rather than a linear progression, the model undergoes periods of incremental tuning followed by sudden, structural breakthroughs.
Real-world evidence of this capability includes:
– Database Optimization: In a task involving high-performance vector database optimization, GLM-5.1 performed over 6,000 tool calls. While previous models hit a ceiling at roughly 3,500 queries per second (QPS), GLM-5.1 autonomously redesigned the system architecture, eventually reaching 21,500 QPS —a six-fold improvement.
– Machine Learning Kernels: In KernelBench tests, the model demonstrated its ability to optimize complex GPU kernels. While it trails slightly behind Anthropic’s Claude Opus 4.6 in raw speedup, GLM-5.1 proved it could sustain productive work far longer than its predecessors, maintaining progress well past 1,000 tool-use turns.
“Agents could do about 20 steps by the end of last year; GLM-5.1 can do 1,700 right now,” noted Z.ai leader Lou on X.
Benchmarking Success: Outperforming the Giants
The release of GLM-5.1 places Z.ai in direct competition with the world’s leading AI labs. On the SWE-Bench Pro —a rigorous benchmark for resolving real-world GitHub issues—GLM-5.1 achieved a score of 58.4, surpassing:
– GPT-5.4 (57.7)
– Claude Opus 4.6 (57.3)
– Gemini 3.1 Pro (54.2)
The model also showed remarkable strength in specialized domains, such as math (95.3 on AIME 2026) and science reasoning (86.2 on GPQA-Diamond). Perhaps most impressively, in a test to build a Linux-style desktop environment from scratch, the model didn’t just stop at a basic interface; it autonomously built a file browser, terminal, and even functional games over an eight-hour period.
A Hybrid Business Strategy: Open Source vs. Proprietary
Z.ai is employing a sophisticated dual-track strategy to balance community growth with commercial viability:
- The Open Source Flagship (GLM-5.1): Released under a permissive MIT License, the model weights are available on Hugging Face. This builds developer trust and establishes a global standard.
- The Proprietary Sprinter (GLM-5 Turbo): A faster, closed-source version optimized for high-speed, supervised tasks. This serves as a high-margin revenue driver for enterprises needing rapid inference.
Pricing and Tiers
Z.ai has positioned GLM-5.1 as an engineering tool rather than a consumer chatbot, offering tiered subscription plans:
– Lite: $27/quarter (for lightweight workloads).
– Pro: $81/quarter (for complex workloads with faster execution).
– Max: $216/quarter (for high-volume, advanced developers).
For API users, the model is priced at $1.40 per million input tokens and $4.40 per million output tokens, placing it competitively against other high-end models like GPT-5.4 and Claude Opus 4.6.
Conclusion
The release of GLM-5.1 signals a new era where the value of an AI model is measured not just by its intelligence, but by its endurance. By proving that open-source models can handle sustained, multi-hour engineering workflows, Z.ai is challenging the dominance of Western proprietary models and redefining the boundaries of autonomous digital labor.




























