The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency

From:

TheSequence <thesequence@substack.com>

To:

Hidden Recipient <hidden@emailshot.io>

Date:

12/7/2025, 12:02 PM

͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Was this email forwarded to you? Sign up here

The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency

Gemini Deep Think, Mistral 3 and Nova 2 dominated the AI headlines.

Dec 7

READ IN APP

Next Week in The Sequence:

Our synthetic data generation series continues with a walkthrough the different types of rephrasing methods. We dive in Gemini 3 Deep Think. The opinion section discusses some new ideas about the future of AI evals.

Subscribe and don’t miss out:

📝 Editorial: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency

The focus of model development shifted noticeably this week. For the past two years, the primary lever for performance gains has been scaling training data and parameter counts. However, the simultaneous releases from Google, Amazon, and Mistral suggest the industry is hitting the point of diminishing returns for pure scale. We are now entering a phase defined by “inference-time compute” and architectural specialization.

The emerging stack is no longer about a single general-purpose model, but rather a set of distinct tools optimized for reasoning, latency, or efficiency.

The most technically significant release is Gemini 3 Deep Think. While previous iterations focused on code generation and creative writing, Deep Think introduces a fundamental architectural change in how the model processes queries.

Instead of the standard immediate next-token prediction, Deep Think utilizes a “parallel thinking” process. It explores multiple reasoning paths and self-corrects before generating a final answer—a method likely involving reinforcement learning on search trees, similar to the techniques used in AlphaProof. This effectively productizes “System 2” thinking: slower, deliberative logic.

The validation of this approach is evident in the ARC-AGI-2 (Abstraction and Reasoning Corpus) benchmark. Deep Think scored 45.1%, a substantial leap over GPT-5.1’s 17.6%. Since ARC measures a model’s ability to solve novel puzzles not seen in training data, this result indicates that spending compute during inference is currently yielding higher returns for complex logic than simply increasing training set size.

While Google focused on logic, Amazon’s re:Invent announcements targeted practical deployment. The Nova 2 family avoids direct confrontation on reasoning benchmarks to focus on latency and integration.

Nova 2 Omni is a multimodal-native model designed for high-throughput agents, processing video and audio with minimal latency. However, the most practical development for enterprise engineers is Nova Forge. This service allows developers to distill larger frontier models into smaller, domain-specific versions using their own proprietary data.

This is a critical pivot for AWS. Instead of just renting generic intelligence via APIs, they are building infrastructure for companies to own optimized, distilled weights. For developers, this lowers the barrier to moving from prototyping on a massive model to production on a cheaper, faster, specialized model.

Mistral’s release of Mistral Large 3 offers a necessary counterweight to the closed ecosystems of Google and Amazon. Built on a Mixture-of-Experts (MoE) architecture and trained on NVIDIA H200s, this model targets the intersection of performance and economy.

Mistral Large 3 achieves parity with current frontier models on standard instruction-following tasks but does so with significantly lower computational overhead. For organizations restricted by data sovereignty laws or privacy concerns, this remains the most viable path: a model that is “smart enough” to rival GPT-4 class systems but open enough to run on private infrastructure or local clusters.

The takeaway for technical teams is that model selection is becoming a routing problem. The “God Model” that handles every task is being replaced by a specialized stack: Deep Think for complex, asynchronous reasoning; Nova for real-time multimodal interfaces; and Mistral for private, high-volume data processing.

The era of blind scaling is over; the era of architectural efficiency has begun.

🔎 AI Research

Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free

AI Lab: Qwen Team, Alibaba Group

Summary: This paper systematically compares over 30 gating variants in softmax attention and finds that a simple head-specific sigmoid gate applied after the scaled dot-product attention (SDPA) consistently improves perplexity, benchmark scores, training stability, and tolerance to larger learning rates in both MoE and dense LLMs. The authors attribute the gains to adding non-linearity to the low-rank WV–WO mapping and introducing query-dependent sparse gating that removes massive activations and attention sinks, yielding better long-context extrapolation and informing the design of Qwen3-Next models.

Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary Constructions

AI Lab: Microsoft Research & ETH Zurich

Summary: This paper introduces HAGeo, a purely CPU-based heuristic system for adding auxiliary constructions in Euclidean geometry that solves 28/30 problems on the IMO-30 benchmark—outperforming AlphaGeometry while being roughly 20× faster. It also builds HAGeo-409, a harder benchmark of 409 Olympiad-level problems with human-rated difficulty, to more accurately measure geometry theorem-proving capabilities.

Qwen3-VL Technical Report

AI Lab: Alibaba Qwen Team

Summary: Qwen3-VL is a family of dense and MoE vision–language models (2B–235B) with native 256K-token multimodal context, enhanced interleaved-MRoPE positional encoding, DeepStack-based multi-level vision–language fusion, and text-based video timestamping. Trained with a staged 8K→32K→256K recipe and extensive multimodal data plus RL, it achieves state-of-the-art or highly competitive results across visual QA, STEM multimodal reasoning, OCR/document understanding, grounding, video, and agentic GUI tasks.

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

AI Lab: DeepSeek-AI

Summary: DeepSeek-V3.2 augments a long-context MLA architecture with DeepSeek Sparse Attention (DSA), which uses a lightweight “lightning indexer” and top-k token selection to reduce attention complexity and significantly cut inference cost on 128K contexts without degrading quality. On top of this, a large-scale GRPO-based RL pipeline and an agentic task-synthesis framework (code, search, and general agents) yield reasoning and tool-use performance comparable to or exceeding GPT-5 and other frontier proprietary models, with a high-compute “Speciale” variant reaching gold-medal performance on IMO, IOI, and ICPC.

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

AI Lab: NVIDIA

Summary: ToolOrchestra trains an 8B “Orchestrator” model with GRPO to act as a central planner that calls a heterogeneous pool of tools—web search, code interpreters, specialized LLMs, and frontier generalist LLMs—through a unified function-calling interface, optimizing for outcome, efficiency, and user tool preferences. Using the synthesized ToolScale environment and RL, the Orchestrator achieves higher accuracy at substantially lower cost than GPT-5-based agents on HLE, FRAMES, and τ²-Bench, and generalizes to unseen tools and pricing configurations.

Guided Self-Evolving LLMs with Minimal Human Supervision

AI Lab: Tencent AI Lab

Summary: The R-FEW framework stabilizes self-play-based “data-free” evolution by coupling a Challenger–Solver setup with a small pool of human “anchor” examples and an online, difficulty-based curriculum: the Challenger generates questions guided by few-shot anchors, while the Solver trains on mid-uncertainty problems from both synthetic and human data. Applied to Qwen3-4B/8B base models, R-FEW delays the performance plateau of R-Zero, mitigates drift and diversity collapse, and approaches or matches General-Reasoner’s math and general-reasoning performance while using roughly 20× less human-labeled data.

SIMA 2: A Generalist Embodied Agent for Virtual Worlds

AI Lab: Google DeepMind (SIMA Team)

Summary: This paper presents SIMA 2, a Gemini-based vision-language-action agent that perceives 3D games through pixels, reasons in natural language (internal CoT and dialogue), and issues low-level keyboard-and-mouse actions, trained via large-scale human gameplay, Gemini-generated “bridge” data, and reinforcement learning across a portfolio of complex commercial and research environments. SIMA 2 approaches human success rates on diverse embodied tasks, generalizes to held-out games and photorealistic Genie 3 worlds, can self-improve via Gemini-driven task setting and reward modeling in new environments like ASKA, and retains most of Gemini’s coding, math, and STEM reasoning capabilities.

🤖 AI Tech Releases

Gemini 3 Deep Think

Google released Gemini 3 Deep Think, its innovative reasoning models that scored gold medals in the recent international math olympiad.

Nova 2

AWS launched the Amazon Nova family of foundation models, featuring new reasoning and multimodal capabilities available via Amazon Bedrock.

Mistral 3

Mistral released Mistral 3, which includes 3 small models (14B, 8B, and 3B) and Mistral Large 3.

📡AI Radar

Nexus Venture Partners announced the close of its $700 million “Nexus Ventures VIII” fund to back early-stage AI, enterprise, and fintech founders in the U.S. and India.
AWS introduced “Kiro,” an autonomous AI agent for developers that can independently handle complex coding tasks, bug triage, and feature implementation.
AWS unveiled Trainium 3, its first 3nm AI chip designed for training massive models, claiming four times the performance of the previous generation.
Nvidia and Synopsys announced an expanded partnership including a $2 billion Nvidia investment to integrate accelerated computing into chip design and engineering simulation.
Salesforce raised its fiscal 2026 revenue guidance after a strong third quarter driven by the rapid adoption of its “Agentforce” AI platform.
The OpenAI Foundation awarded $40.5 million in unrestricted grants to over 200 U.S. nonprofits to support community-based AI education and opportunity.
Marvell Technology agreed to acquire Celestial AI for $3.25 billion to accelerate its optical interconnect strategy for next-generation AI data centers.
French voice AI startup Gradium raised $70 million in seed funding from investors including Eric Schmidt and Xavier Niel to develop ultra-low-latency audio models. Former Google CEO Eric Schmidt invests in French AI startup Gradium
OpenAI acquired a stake in Thrive Holdings to help deploy enterprise-grade AI solutions across Thrive’s portfolio of companies. OpenAI strengthens investor alliance with stake in Thrive Holdings.
Aaru, a startup that uses synthetic research to streamline predictions raised a new round of financing at a $1 billion valuation.
Yoodli, a platform that uses AI for communication training, raised $40 million at approximately $300 million valuation.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Comment

Restack

Similar newsletters

There are other similar shared emails that you might be interested in: