The Sequence Radar #779: The Inference Wars and China’s AI IPO Race
Was this email forwarded to you? Sign up here The Sequence Radar #779: The Inference Wars and China’s AI IPO RaceNVIDIA's large deal with Groq and new releases by Z.ai and Minimax.Next Week in The Sequence:Our knowledge series continues diving into synthetic data generation. This time we focus on synthetic image generation. In the AI of the Week section we are going to cover the newly released GLM 4.7. In the opinion section, we will discuss three non-trivial AI areas that I am excited about fro 2026. Subscribe and don’t miss out:📝 Editorial: The Inference Wars and China’s AI IPO RaceThis week, the AI industry witnessed a fascinating divergence in strategy between East and West, marked by a massive talent consolidation in Silicon Valley and a maturing public market push in Beijing. While Nvidia solidified its iron grip on inference hardware through a strategic “acqui-hire” of Groq’s leadership, China’s leading model labs—Zhipu AI and MiniMax—simultaneously released frontier-class models and cleared major regulatory hurdles for Hong Kong IPOs. The holiday headlines were dominated by Nvidia’s decisive move to license Groq’s LPU (Language Processing Unit) technology and hire its core leadership, including founder Jonathan Ross. While rumors initially pegged this as a $20 billion acquisition, the reality is more nuanced and perhaps more ruthless: a non-exclusive licensing deal that decapitates a key competitor by absorbing its visionary talent while leaving the entity independent. This follows the “Microsoft-Inflection” playbook—avoiding antitrust gridlock while securing critical IP. For Nvidia, this isn’t just about removing a rival; it’s about acknowledging that the next phase of the AI cycle is inference. By integrating Groq’s ultra-low-latency architecture, Nvidia is future-proofing its dominance against a rising tide of specialized inference chips. While US hardware consolidates, Chinese model labs are delivering shocking performance per dollar. Zhipu AI led the charge on December 22 with the release of GLM 4.7, a model that directly challenges the Western “reasoning” class exemplified by o1 and DeepThink. By introducing “Deep Thinking” capabilities, GLM 4.7 demonstrates state-of-the-art performance in complex coding tasks, effectively positioning Zhipu as the “OpenAI of China” with a robust full-stack ecosystem. Simultaneously, MiniMax took a divergent architectural path with the release of M 2.1. Moving away from standard Transformers, this new model utilizes a linear attention mechanism combined with a Mixture-of-Experts (MoE) design. The result is a massive 4-million-token context window that offers extreme computational efficiency, which MiniMax is aggressively marketing as a “RAG Killer” capable of ingesting entire repositories for a fraction of the cost of traditional vector search pipelines. Perhaps most significantly, both Zhipu AI and MiniMax have reportedly cleared regulatory hurdles to list on the Hong Kong Stock Exchange in early 2026. This signals a new era where foundation model labs move from venture-subsidized research projects to public entities demanding revenue discipline. The “scaling laws” are bifurcating. In the US, scaling is becoming a function of capital concentration—Nvidia absorbing the best hardware minds to build bigger clusters. In China, scaling is becoming architectural and economic—optimizing for massive context and agentic reasoning to viable commercial products ahead of public listings. The race is no longer just about who has the most GPUs, but who can build a sustainable business on top of them. 🔎 AI ResearchGoogle Research 2025: Bolder breakthroughs, bigger impactAI Lab: Google Research Summary: This article outlines Google Research’s 2025 achievements, highlighting the acceleration of the “magic cycle” of research that translates breakthroughs in generative models, quantum computing, and scientific discovery into real-world impact. Key advancements include the high-factuality Gemini 3 model with generative UI, the “Quantum Echoes” algorithm for verifiable quantum advantage, and new AI tools for genomics, neuroscience, and climate resilience. Gemma Scope 2 - Technical PaperAI Lab: Google Summary: This paper introduces Gemma Scope 2, a comprehensive suite of open sparse autoencoders (SAEs) and transcoders trained on all layers of the Gemma 3 model family to advance interpretability research. The release includes novel multi-layer architectures, such as crosscoders and cross-layer transcoders, designed to capture complex circuit-level interactions and behaviors that span across multiple transformer blocks. SAM Audio: Segment Anything in AudioAI Lab: Meta Superintelligence Labs Summary: This paper presents SAM AUDIO, a foundation model that unifies audio separation by allowing users to prompt with text descriptions, visual masks, or temporal spans within a single flow-matching diffusion framework. The model achieves state-of-the-art performance across diverse domains like speech, music, and sound effects, and introduces a new benchmark, SAM AUDIO-BENCH, which includes human-labeled multimodal prompts for more realistic evaluation. INTELLECT-3: Technical ReportAI Lab: Prime Intellect Summary: This report introduces INTELLECT-3, a 106-billion parameter Mixture-of-Experts model trained via a new scalable reinforcement learning framework called prime-rl, which supports asynchronous off-policy training and continuous batching. The model outperforms similar-sized open-source models on reasoning, math, and coding benchmarks, and the authors release the full training stack, including the model weights, the RL framework, and the environment library, to democratize large-scale agentic training. Bolmo: Byteifying the Next Generation of Language ModelsAI Lab: Allen Institute for AI (Ai2), University of Cambridge, University of Washington, University of Edinburgh Summary: This paper introduces Bolmo, a family of open byte-level language models created by efficiently “byteifying” existing subword models to overcome tokenization limitations. The resulting architecture uses non-causal boundary prediction to achieve performance competitive with state-of-the-art subword models while enabling superior character-level understanding. GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment SimulatorsAI Lab: Princeton University, ByteDance Seed, Columbia University, University of Michigan, University of Chicago Summary: The authors propose GenEnv, a framework that trains agents via a co-evolutionary game where a generative simulator creates dynamic tasks aligned with the agent’s current difficulty level. This adaptive curriculum approach significantly improves agent performance across multiple benchmarks while requiring substantially less synthetic data than static augmentation methods. 🤖 AI Tech ReleasesGLM 4.7Z.ai released GLM 4.7, a new version of its marquee model with strong coding and agentic capabililties. MiniMax M2.1Minimax released M2.1, a new multimodal model optimized for complex tasks. BloomAnthropic released Bloom, a tool for behavioral evaluations of frontier models. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Similar newsletters
There are other similar shared emails that you might be interested in:


