The Sequence Radar #
Was this email forwarded to you? Sign up here Next Week in The Sequence:We contiune our series about synthetic data generation with an exploration of the current types of generative synthesis. In the AI of the week, we will dive into teh amazing Olmo 3 stack. In our opinion section we are going to dive into the state of open source AI. Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Grok 4.1, Gemini 3 Pro and the Agentic StackToday I’d like to start with a personal note before diving into this week’s AI developments. Two years ago, I co-founded a company called NeuralFabric to pretrain, post-train, distill and fine-tune small frontier models. We were convinced that smaller, efficient models are a critical ingredient for embodied AI, mobile and IoT workloads, and many enterprise AI workflows that will never run a 400B-parameter beast in production. We built a state-of-the-art platform for training, distillation, and advanced adaptation techniques for these models—and fairly quickly, to our surprise, NeuralFabric began attracting serious attention from large enterprise players. Last week, Cisco completed the acquisition of NeuralFabric and announced its intention to use our technology as part of its next-generation enterprise AI capabilities. For me, this journey has been an incredible learning experience about the real state of the AI market. It’s also what makes this newsletter special: much of what you read here is grounded in building and deploying AI systems in the wild, not just reciting papers or headlines. With that context in mind, let’s turn to the broader developments of the week. This week in AI felt less like a batch of model drops and more like a glimpse of the emerging agentic stack: brains, tools, visuals, and a working environment all snapping into place. On the “brain” side, xAI pushed Grok 4.1, a focused upgrade that leans hard into usability rather than just leaderboard flexing. The new release is positioned as a better “everyday” model: stronger reasoning, more consistent long-form writing, and a noticeable reduction in off-the-rails hallucinations. It ships with both a more deliberate “thinking” mode and a faster interactive mode, signaling that Grok isn’t just a quirky side project anymore but something you can actually plug into production workflows. Google answered on the flagship front with Gemini 3 Pro, designed as a general-purpose reasoning and agentic coding model. It’s natively multimodal and built for long-context work: full repos, multi-document analysis, and sessions that blend text, images, and other media. Under the hood, Gemini 3 Pro is clearly optimized around tool use and multi-step workflows rather than just chat completion. You can think of it less as “a chatbot” and more as a planning-and-execution engine that happens to speak natural language very well. On the visual side, Google extended the ecosystem with NanoBanana, its new image generation and editing stack. The base capabilities focus on speed and editability: character consistency, local edits, fast iterations. The more advanced tier pushes into higher resolution, more reliable text rendering inside images, better control over style and lighting, and tighter coupling with language models so you can drive fairly complex visual transformations with simple prompts. Images stop being one-shot samples and become an interactive design loop. All of this comes together in Antigravity, a new “agent-first” development environment. Instead of the usual coding assistant that sprinkles suggestions into your IDE, Antigravity treats the model as a first-class actor: an AI that can manage an editor, a terminal, and a browser; plan multi-step tasks; execute code; run tests; and leave behind an auditable trail of what it did and why. It’s designed to be model-agnostic but is deeply wired into the Gemini stack from day one. Taken together, Grok 4.1, Gemini 3 Pro, NanoBanana, Antigravity—and, in a different corner of the ecosystem, NeuralFabric’s journey into Cisco—point in the same direction: away from “smart autocomplete” and toward agentic systems that can reason over huge contexts, act across tools, and express themselves in both code and pixels. 🔎 AI ResearchMixture of States: Routing Token-Level Dynamics for Multimodal GenerationAI Lab: Meta AI & KAUST Summary: This paper introduces Mixture of States (MoS), a multimodal diffusion framework with a lightweight token-wise router that dynamically selects hidden states across text and vision towers based on the denoising timestep and input content. With 3–5B parameter models, MoS-Image and MoS-Edit achieve state-of-the-art text-to-image and image-editing performance, matching or surpassing models up to four times larger while remaining highly efficient. Seer: Online Context Learning for Fast Synchronous LLM Reinforcement LearningAI Lab: Moonshot AI & Tsinghua University Summary: Seer is a synchronous RL system for LLMs that exploits intra-group similarities in GRPO-style rollouts, introducing divided rollout with a global KV cache, context-aware scheduling, and adaptive grouped speculative decoding to reduce long-tail latency and improve hardware utilization. On large production RL workloads (Moonlight, Qwen2-VL-72B, Kimi-K2), it boosts rollout throughput by 74–97% and cuts long-tail latency by 75–93% compared to a strong synchronous baseline. ARC-Chapter: Structuring Hour-Long Videos into Navigable Chapters and Hierarchical SummariesAI Lab: ARC Lab, Tencent Summary: ARC-Chapter is a multimodal video chaptering model trained on VidAtlas, a new large-scale dataset of over 400,000 hours of videos with hierarchical annotations, combining ASR transcripts, visual captions, and OCR to produce timestamped titles, structured chapter summaries, and dense video descriptions. The authors also propose the GRACE metric and show that ARC-Chapter sets new state-of-the-art results on VidChapters-7M and transfers strongly to dense video captioning benchmarks like YouCook2 and ActivityNet Captions. Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM PerformanceAI Lab: Meta SuperIntelligence Labs & FAIR at Meta Summary: This work proposes Soup Of Category Experts (SoCE), a model-souping method that selects “expert” checkpoints per weakly correlated benchmark category and combines them with optimized non-uniform weights instead of simple uniform averaging. SoCE yields consistent gains across tool-calling, multilingual math, and long-context benchmarks, including state-of-the-art results on the Berkeley Function Calling Leaderboard and improved robustness and correlation across evaluation categories. Real-time speech-to-speech translationAI Lab: Google DeepMind & Google Core ML Summary: This work presents an end-to-end streaming speech-to-speech translation model that can translate in real time in the speaker’s own voice with about a two-second delay, using a transformer-based audio-to-audio architecture with RVQ audio tokens and the AudioLM/SpectroStream stack. A scalable time-synchronized data pipeline plus low-bit quantization and inference optimizations enable robust performance across multiple language pairs, powering features in Google Meet and on-device Pixel Voice Translate. Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMsAI Lab: NVIDIA Research Summary: This paper introduces Nemotron Elastic, an elastic training framework for hybrid Mamba–Attention LLMs that embeds multiple nested submodels (6B, 9B, 12B) inside a single parent reasoning model via an end-to-end learned router and structured masking. Using only 110B tokens, it simultaneously produces competitive 6B and 9B variants from a 12B teacher, achieving up to 360× training cost reduction versus training separate model families and enabling constant deployment memory through zero-shot slicing of all submodels from one checkpoint. 🤖 AI Tech ReleasesGemini 3Google launched Gemini 3, the latest version of its marquee model with capabilities optimized for reasoning and agentic workflows. Grok 4.1xAI released Grok 4.1 with impressive results in top benchmarks. Google AntigravityGoolge released Antigravity, a new IDE for agentic software development. Nano Banana ProGoogle also released Nano Banana Pro, its next generation image generation model. SAM3Meta released Segment Anything 3(SAM 3), its object segmentation and tracking model, they also released the SAM 3 playground. Olmo 3The Allen Institute for AI(AI2) released Olmo3, a completely open source family of models, datasets and training stack. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Similar newsletters
There are other similar shared emails that you might be interested in:


