The Sequence Radar #696: Google AI Ultra’s New Gold-Medal Reasoning Model is Available
Was this email forwarded to you? Sign up here The Sequence Radar #696: Google AI Ultra’s New Gold-Medal Reasoning Model is AvailableThe new model is available to the Gemini app subscribers.Next Week in The Sequence:
Subscribe Now to Not Miss Anything📝 Editorial: Google AI Ultra’s New Gold-Medal Reasoning Model is AvailableGoogle just made its math olympiad gold medalist model available! Gemini 2.5 Deep Think, released this week to Google AI Ultra subscribers, has already demonstrated its extraordinary reasoning power by securing a gold‑medal performance at the 2025 International Math Olympiad—solving four shortlisted proofs under timed, exam‑style conditions and matching the best human competitors. The magic of DeepThink is its ability to operationalize truly parallel hypothesis streams, spawning multiple “agents” to explore solution trajectories simultaneously rather than committing to a single linear path. More details here. At the heart of Deep Think lies the convergence of sparse mixture-of-experts (MoE) and multi-agent prompting. Its MoE backbone dynamically routes tokens among specialized expert sub-networks, delivering enormous total capacity with controlled inference costs. Layered atop this, a multi-agent orchestration mechanism dispatches parallel reasoning threads that independently propose and refine sub-solutions before synthesizing a final answer, thereby mitigating path dependency and amplifying creative problem solving. Technically, Gemini 2.5 Deep Think supports a 1 million‑token context window and can generate up to 192 000 tokens in one session, making it uniquely capable of deep codebase audits, extended symbolic reasoning, and multimodal investigatory workflows. Its training incorporated bespoke theorem‑proving corpora and reinforcement‑learning datasets that reward systematic, stepwise deductions, while its multimodal design—across text, vision, audio, and video—paves the way for unified cross-domain reasoning tasks. Perhaps the most compelling testament to its prowess arrived at the 2025 International Math Olympiad (IMO), where Deep Think was tasked with solving four proofs drawn from the competition’s shortlist. The model achieved a gold‑medal performance—matching the top human scorers—by correctly deriving all four solutions under time constraints comparable to human exam durations. Its parallel‑agent framework enabled simultaneous exploration of multiple proof strategies, trimming solution time by over 50% compared to single‑pass baselines and outperforming earlier dense‑decoder prototypes in both accuracy and elegance of exposition. Google plans a phased API rollout of Deep Think, offering both “with‑tools” and “without‑tools” variants alongside usage quotas and cost‑management guidelines. In the Gemini Pro UI, Ultra subscribers see a dedicated toggle granting limited daily Deep Think sessions, while developers can soon integrate its capabilities via the Gemini API under a usage‑based pricing model calibrated for its higher computational demands. Gemini 2.5 Deep Think heralds a paradigm shift: AI systems that autonomously explore and evaluate multiple solution paths rather than trailing monolithic heuristics. Its success at the IMO demonstrates not only raw computational acumen but strategic reasoning akin to human experts. As the technical AI community embraces this leap, crucial conversations will revolve around sustainability, equitable access, and governance frameworks to ensure that next‑generation reasoning engines advance collective scientific and technological frontiers responsibly. 🔎 AI ResearchTitle: Persona Vectors: Monitoring and Controlling Character Traits in Language ModelsAI Lab: Anthropic Title: Beyond Binary Rewards: Training LMs to Reason About Their UncertaintyAI Lab: MIT Title: Group Sequence Policy OptimizationAI Lab: Qwen Team, Alibaba Inc. Title: Falcon-H1: A Family of Hybrid-Head Language ModelsAI Lab: Falcon LLM Team (TII UAE) Title: SAND-Math: Using LLMs to Generate Novel, Difficult and Useful Mathematics Questions and AnswersAI Lab: AMD (Advanced Micro Devices, Inc.) Summary: This paper introduces SAND-Math, an automated pipeline that generates challenging and novel math problems using large language models and systematically amplifies their difficulty through a technique called "Difficulty Hiking." The resulting dataset significantly outperforms other synthetic benchmarks—boosting model performance by up to 17.85 points on AIME25—and proves especially effective for augmenting and training resource-efficient mathematical reasoning models. Title: MLE-STAR: Machine Learning Engineering Agent via Search and Targeted RefinementAI Lab: Google Cloud & KAIST 🤖 AI Tech ReleasesGLM 4.5Chinese AI lab a.zi launched GLM 4.5, a powerful foundation model with strong agentic capabilities. Llama Nemotron Super 1.5NVIDIA released a new version of its Llama Nemotron model optimized for multi-step agentic tasks. Codestral 25.08 Mistral released Codestral 25.08, an entire enterprise coding stack. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Similar newsletters
There are other similar shared emails that you might be interested in: