The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral Efficiency
Was this email forwarded to you? Sign up here The Sequence Radar #767: Last Week in AI: Google Logic, Amazon Utility, and Mistral EfficiencyGemini Deep Think, Mistral 3 and Nova 2 dominated the AI headlines.Next Week in The Sequence:Our synthetic data generation series continues with a walkthrough the different types of rephrasing methods. We dive in Gemini 3 Deep Think. The opinion section discusses some new ideas about the future of AI evals. Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Google Logic, Amazon Utility, and Mistral EfficiencyThe focus of model development shifted noticeably this week. For the past two years, the primary lever for performance gains has been scaling training data and parameter counts. However, the simultaneous releases from Google, Amazon, and Mistral suggest the industry is hitting the point of diminishing returns for pure scale. We are now entering a phase defined by “inference-time compute” and architectural specialization. The emerging stack is no longer about a single general-purpose model, but rather a set of distinct tools optimized for reasoning, latency, or efficiency. The most technically significant release is Gemini 3 Deep Think. While previous iterations focused on code generation and creative writing, Deep Think introduces a fundamental architectural change in how the model processes queries. Instead of the standard immediate next-token prediction, Deep Think utilizes a “parallel thinking” process. It explores multiple reasoning paths and self-corrects before generating a final answer—a method likely involving reinforcement learning on search trees, similar to the techniques used in AlphaProof. This effectively productizes “System 2” thinking: slower, deliberative logic. The validation of this approach is evident in the ARC-AGI-2 (Abstraction and Reasoning Corpus) benchmark. Deep Think scored 45.1%, a substantial leap over GPT-5.1’s 17.6%. Since ARC measures a model’s ability to solve novel puzzles not seen in training data, this result indicates that spending compute during inference is currently yielding higher returns for complex logic than simply increasing training set size. While Google focused on logic, Amazon’s re:Invent announcements targeted practical deployment. The Nova 2 family avoids direct confrontation on reasoning benchmarks to focus on latency and integration. Nova 2 Omni is a multimodal-native model designed for high-throughput agents, processing video and audio with minimal latency. However, the most practical development for enterprise engineers is Nova Forge. This service allows developers to distill larger frontier models into smaller, domain-specific versions using their own proprietary data. This is a critical pivot for AWS. Instead of just renting generic intelligence via APIs, they are building infrastructure for companies to own optimized, distilled weights. For developers, this lowers the barrier to moving from prototyping on a massive model to production on a cheaper, faster, specialized model. Mistral’s release of Mistral Large 3 offers a necessary counterweight to the closed ecosystems of Google and Amazon. Built on a Mixture-of-Experts (MoE) architecture and trained on NVIDIA H200s, this model targets the intersection of performance and economy. Mistral Large 3 achieves parity with current frontier models on standard instruction-following tasks but does so with significantly lower computational overhead. For organizations restricted by data sovereignty laws or privacy concerns, this remains the most viable path: a model that is “smart enough” to rival GPT-4 class systems but open enough to run on private infrastructure or local clusters. The takeaway for technical teams is that model selection is becoming a routing problem. The “God Model” that handles every task is being replaced by a specialized stack: Deep Think for complex, asynchronous reasoning; Nova for real-time multimodal interfaces; and Mistral for private, high-volume data processing. The era of blind scaling is over; the era of architectural efficiency has begun. 🔎 AI ResearchGated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-FreeAI Lab: Qwen Team, Alibaba Group Summary: This paper systematically compares over 30 gating variants in softmax attention and finds that a simple head-specific sigmoid gate applied after the scaled dot-product attention (SDPA) consistently improves perplexity, benchmark scores, training stability, and tolerance to larger learning rates in both MoE and dense LLMs. The authors attribute the gains to adding non-linearity to the low-rank WV–WO mapping and introducing query-dependent sparse gating that removes massive activations and attention sinks, yielding better long-context extrapolation and informing the design of Qwen3-Next models. Gold-Medal-Level Olympiad Geometry Solving with Efficient Heuristic Auxiliary ConstructionsAI Lab: Microsoft Research & ETH Zurich Summary: This paper introduces HAGeo, a purely CPU-based heuristic system for adding auxiliary constructions in Euclidean geometry that solves 28/30 problems on the IMO-30 benchmark—outperforming AlphaGeometry while being roughly 20× faster. It also builds HAGeo-409, a harder benchmark of 409 Olympiad-level problems with human-rated difficulty, to more accurately measure geometry theorem-proving capabilities. Qwen3-VL Technical ReportAI Lab: Alibaba Qwen Team Summary: Qwen3-VL is a family of dense and MoE vision–language models (2B–235B) with native 256K-token multimodal context, enhanced interleaved-MRoPE positional encoding, DeepStack-based multi-level vision–language fusion, and text-based video timestamping. Trained with a staged 8K→32K→256K recipe and extensive multimodal data plus RL, it achieves state-of-the-art or highly competitive results across visual QA, STEM multimodal reasoning, OCR/document understanding, grounding, video, and agentic GUI tasks. DeepSeek-V3.2: Pushing the Frontier of Open Large Language ModelsAI Lab: DeepSeek-AI Summary: DeepSeek-V3.2 augments a long-context MLA architecture with DeepSeek Sparse Attention (DSA), which uses a lightweight “lightning indexer” and top-k token selection to reduce attention complexity and significantly cut inference cost on 128K contexts without degrading quality. On top of this, a large-scale GRPO-based RL pipeline and an agentic task-synthesis framework (code, search, and general agents) yield reasoning and tool-use performance comparable to or exceeding GPT-5 and other frontier proprietary models, with a high-compute “Speciale” variant reaching gold-medal performance on IMO, IOI, and ICPC. ToolOrchestra: Elevating Intelligence via Efficient Model and Tool OrchestrationAI Lab: NVIDIA Summary: ToolOrchestra trains an 8B “Orchestrator” model with GRPO to act as a central planner that calls a heterogeneous pool of tools—web search, code interpreters, specialized LLMs, and frontier generalist LLMs—through a unified function-calling interface, optimizing for outcome, efficiency, and user tool preferences. Using the synthesized ToolScale environment and RL, the Orchestrator achieves higher accuracy at substantially lower cost than GPT-5-based agents on HLE, FRAMES, and τ²-Bench, and generalizes to unseen tools and pricing configurations. Guided Self-Evolving LLMs with Minimal Human SupervisionAI Lab: Tencent AI Lab Summary: The R-FEW framework stabilizes self-play-based “data-free” evolution by coupling a Challenger–Solver setup with a small pool of human “anchor” examples and an online, difficulty-based curriculum: the Challenger generates questions guided by few-shot anchors, while the Solver trains on mid-uncertainty problems from both synthetic and human data. Applied to Qwen3-4B/8B base models, R-FEW delays the performance plateau of R-Zero, mitigates drift and diversity collapse, and approaches or matches General-Reasoner’s math and general-reasoning performance while using roughly 20× less human-labeled data. SIMA 2: A Generalist Embodied Agent for Virtual WorldsAI Lab: Google DeepMind (SIMA Team) Summary: This paper presents SIMA 2, a Gemini-based vision-language-action agent that perceives 3D games through pixels, reasons in natural language (internal CoT and dialogue), and issues low-level keyboard-and-mouse actions, trained via large-scale human gameplay, Gemini-generated “bridge” data, and reinforcement learning across a portfolio of complex commercial and research environments. SIMA 2 approaches human success rates on diverse embodied tasks, generalizes to held-out games and photorealistic Genie 3 worlds, can self-improve via Gemini-driven task setting and reward modeling in new environments like ASKA, and retains most of Gemini’s coding, math, and STEM reasoning capabilities. 🤖 AI Tech ReleasesGemini 3 Deep ThinkGoogle released Gemini 3 Deep Think, its innovative reasoning models that scored gold medals in the recent international math olympiad. Nova 2AWS launched the Amazon Nova family of foundation models, featuring new reasoning and multimodal capabilities available via Amazon Bedrock. Mistral 3Mistral released Mistral 3, which includes 3 small models (14B, 8B, and 3B) and Mistral Large 3. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Similar newsletters
There are other similar shared emails that you might be interested in:


