The Sequence Radar #771: Last Week in AI: GPT-5.2, Mistral, and Google’s Agent Stack

From:

TheSequence <thesequence@substack.com>

To:

Hidden Recipient <hidden@emailshot.io>

Date:

12/14/2025, 12:02 PM

͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Was this email forwarded to you? Sign up here

The Sequence Radar #771: Last Week in AI: GPT-5.2, Mistral, and Google’s Agent Stack

A very unique week in AI releases

Dec 14

READ IN APP

Next Week in The Sequence:

Learn more about synthetic data generation with a deep dive into multi-turn data synthetic. Our AI of the Week section dives into Google’s new agentic releases. The opinion section dives into the state of audio models.

Subscribe and don’t miss out:

📝 Editorial: Last Week in AI: GPT-5.2, Mistral, and Google’s Agent Stack

Last week felt like a clean transition from “bigger models” to “model-centric systems”: frontier releases aimed at long-running work, open-weight competitors tightening the gap, and infrastructure APIs that treat agents as first-class citizens.

OpenAI’s GPT-5.2 landed with a clear product thesis: the model isn’t just smarter, it’s designed to finish multi-step knowledge work end-to-end—spreadsheets, decks, coding, and tool-mediated workflows—without falling apart when context gets long or the task branches. What matters technically is the emphasis on “long-running agents” and more reliable tool use as a default mode, not a bolted-on trick. That’s a subtle shift: the model is being shipped as an operator for workflows, not merely a text generator. If you’re building internal copilots or external agent products, this kind of release typically translates into fewer brittle prompt contraptions and more stable task decomposition, planning, and execution across tools.

Mistral, meanwhile, doubled down on open weights as a serious deployment strategy, not a branding choice. The recent Mistral 3 family pairs small dense variants with a large sparse MoE flagship and very long context, and it reinforces a pattern: open models are no longer “good enough,” they’re increasingly a preferred option when teams want control over hosting, latency, privacy boundaries, and fine-tuning. Just as important, Mistral’s coding-oriented line continues to push toward SWE-agent behavior—multi-file edits, codebase navigation, and structured tool use. For technical teams, the practical implication is that “agentic coding” is becoming deployable on infrastructure you can actually own, rather than something trapped behind a single vendor’s API.

Google’s big move is less about a single model and more about the plumbing that makes agents practical. The Interactions API is effectively an opinionated interface for the agent loop: server-side interaction state, tool-augmented flows, and support for long-running execution so research tasks don’t have to live inside a single synchronous request/response window. That shift matters because it normalizes agent architectures in product stacks. Instead of every team reinventing memory, state, retries, and long jobs, you get an API surface that assumes those patterns upfront. Gemini Deep Research then becomes a composable component—an agent you can embed inside your workflow rather than a monolithic “research mode” feature. The second-order effect is that “research” becomes a primitive, like search or retrieval, which can be chained with coding, data analysis, or doc generation in a single system.

Finally, Unconventional AI’s headline isn’t a model at all—it’s the growing acceptance that compute and energy are now the hard ceilings. The company’s splashy seed round signals investor conviction that “post-GPU” (or at least hybrid) architectures are becoming investable, framed around efficiency-first computing approaches to break the power wall for scaling. Whether any given approach wins is still an execution story, but the meta-trend is hard to ignore: frontier AI progress is increasingly constrained by joules, not just parameters. That constraint is shaping product decisions today—more MoEs, more distillation, more on-device and edge inference, and more obsession with end-to-end system efficiency.

Net: GPT-5.2 pushes the “agent as default interface” narrative, Mistral compresses that capability into controllable open-weight deployment, Google standardizes the agent loop with APIs designed for long-running work, and Unconventional AI reminds everyone that the next step-function gains may come from new compute abstractions as much as new training recipes.

🔎 AI Research

BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain,

AI Lab: Weizmann Institute of Science & MIT

Summary: This paper introduces BrainExplore, an automated pipeline that decomposes fMRI activity across visual cortex into thousands of interpretable patterns using PCA/NMF/ICA and sparse autoencoders, aided by an image-to-fMRI model to synthesize extra responses. The system links each pattern to natural images and natural-language concepts, revealing fine-grained visual representations (e.g., specific actions, body parts, and scene types) across different brain regions.

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

AI Lab: Carnegie Mellon University

Summary: The authors build a controlled synthetic reasoning framework to disentangle how pre-training, mid-training, and RL each contribute to reasoning generalization in language models. They find that RL yields true capability gains only when operating at the model’s “edge of competence,” that minimal pre-training exposure is required for contextual transfer, that mid-training is a key but underexplored driver of performance under fixed compute, and that process-level rewards reduce reward hacking and improve reasoning fidelity.

Learning Unmasking Policies for Diffusion Language Models

AI Lab: Apple (with UvA & MIT collaborators)

Summary: This work treats masked diffusion LM sampling as a Markov decision process and trains a lightweight transformer policy via RL to decide which tokens to unmask at each step based on model confidences. The learned policies match or surpass state-of-the-art heuristic samplers like Fast-dLLM, especially outside the semi-autoregressive regime, and transfer reasonably across models, sequence lengths, and domains.

ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models

AI Lab: Meta Superintelligence Labs (with UC Berkeley & UCSF)

Summary: ThreadWeaver introduces an adaptive parallel reasoning framework where models spawn and join multiple reasoning threads using lightweight control tokens, implemented entirely on top of standard autoregressive inference engines via a trie-based training/inference design. Combined with a parallelization-aware GRPO variant, the system matches the accuracy of strong sequential CoT models while reducing token latency by up to ~1.5× on challenging math benchmarks.

URANIA: Differentially Private Insights into AI Use

AI Lab: Google Research

Summary: URANIA is a differentially private pipeline for summarizing large-scale LLM chatbot logs using DP clustering, partition selection, and histogram-based keyword extraction, followed by LLM-generated cluster summaries. The authors show that URANIA can provide useful, high-level usage insights comparable to a simplified CLIO-style baseline while enjoying formal end-to-end (ε, δ)-DP guarantees and improved robustness under membership-style privacy attacks.

Towards a Science of Scaling Agent Systems

AI Lab: Google Research & Google DeepMind (with MIT),

Summary: This paper systematically compares single-agent and several multi-agent architectures across four agentic benchmarks and three LLM families under matched tools, prompts, and compute to isolate coordination effects. It derives a quantitative scaling model showing when multi-agent coordination helps or hurts—highlighting a tool–coordination trade-off, a capability saturation point where extra agents give negative returns, and architecture-dependent error amplification—and uses these insights to predict the optimal agent topology for new tasks.

🤖 AI Tech Releases

GPT 5.2

OpenAI released GPT 5.2, highly optimized for productivity work.

Gemini Deep Research Agent

Google released a new Deep Research agent with advanced tool capabilities.

Interactions API

Google also released the Interactions API, specifically designed for complex agentic tasks.

FACTS Benchmark

Google DeepMind released the FACTS Benchmark Suite, three benchmarks to evaluate factuality in AI models.

Devstral 2

Mistral open sourced Devstral 2, their next generation family of coding models.

GLM-4.6V

Z.ai open sourced, GLM-4.6V, a new multimodal model with native tool usage.

📡AI Radar

Harness raised a $240M Series E (incl. a $40M tender), valuing the company at $5.5B to automate “everything after code” in software delivery.
Galbot (embodied-AGI robotics startup) is reportedly selecting banks for a potential 2026 Hong Kong IPO; company overview here.
Korea’s MSIT laid out a 2026 push to become a top-3 global AI power, including large-scale GPU procurement and a National AI Computing Center.
1X and EQT announced a strategic partnership targeting rollout of up to 10,000 humanoid robots across EQT portfolio companies (2026–2030).
Medra announced a $52M Series A to build “Physical AI Scientists,” pairing LLM reasoning with robotic lab execution and an autonomous lab launching in 2026.
India’s ~$52B tech expansion figure maps to Amazon’s “$35B by 2030” + commitment.
Microsoft announced a $17.5B India investment (CY2026–2029) to scale AI/cloud infrastructure, skilling, and operations (incl. new hyperscale capacity).
MiniMax and Zhipu are reportedly preparing Hong Kong IPOs; company pages here.
Oracle reported Q2 FY2026 results with $16.1B revenue (+14% YoY) and $8.0B cloud revenue (+34%), with IaaS up 68%.
Port raised a $100M Series C at an $800M valuation to expand its agentic engineering / internal developer platform.
ElevenLabs launched a $100M employee tender at a $6.6B valuation about 2× its Series C valuation from ~9 months prior
Unconventional AI publicly launched and disclosed a $475M seed round at a $4.5B valuation to pursue biology-scale energy-efficient AI compute.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Comment

Restack

Similar newsletters

There are other similar shared emails that you might be interested in: