The Sequence Radar #755: Last Week in AI: Worlds Built, Models Refined, and Legends Move On
Was this email forwarded to you? Sign up here The Sequence Radar #755: Last Week in AI: Worlds Built, Models Refined, and Legends Move OnWorld models dominated last week in AI but that wasn't all.Next Week in The Sequence:Our series about synthetic data generation continues with an intro to generative synthesis. In the AI of the week, we are going to dive into DeepMind’s new SIMA world model. Our opinion section will dive into the possibilities and challenges of world models. Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Worlds Built, Models Refined, and Legends Move OnPretty loaded week in AI across several trends: world-models, large language models, and multilingual speech—are starting to converge into a more coherent picture of where the field is heading. We also have major fundraise events and some surprising departures. On the world-model side, Marble from World Labs illustrates how far things have moved beyond static perception. Rather than just taking in images or video and labeling what’s there, Marble is designed to reconstruct persistent 3D environments from multimodal input and maintain coherent state over time. The model can assemble scenes, objects, and surfaces into a navigable representation that agents can inhabit and interact with. For practitioners, this suggests a different kind of data and tooling stack: pipelines for 3D assets and scene graphs, training loops that care about continuity and physical plausibility, and evaluation focused on how well an environment supports downstream tasks rather than just reconstruction quality. World Labs itself is a good illustration of how individual researchers shape these trajectories. As a co-founder, Fei-Fei Li brings a track record that includes ImageNet, large-scale visual recognition benchmarks, and a sustained push toward data-centric AI. Her work has repeatedly redefined how the field thinks about supervision, representation, and the role of high-quality datasets. In many ways, Marble can be seen as an extension of that vision: moving from static labeled images to rich, interactive worlds as the substrate on which intelligent behavior is learned. Her broader impact—through research, open datasets, and institution-building—is part of why world-models are emerging as a serious, well-founded direction rather than just a speculative trend. DeepMind’s SIMA-style agents push the same idea from a different angle: controlling agents that operate inside complex virtual worlds. Instead of treating a world-model as a passive representation, SIMA emphasizes interactive behavior—navigating spaces, manipulating objects, following natural-language instructions, and learning from feedback within 3D environments. This brings together perception, action, and language into a single loop. If Marble is about building the stage, SIMA is about training the actors. For the ecosystem, that means more attention to interfaces between foundation models and engines, from standardized action spaces to APIs for logging trajectories and rewards at scale. Before turning to language models, another notable development this week came from the developer-tooling ecosystem: Cursor announced a $2.3 billion funding round, lifting its valuation to $29.3 billion. Cursor has quickly become one of the most widely adopted AI coding assistants because of its tight integration between the editor, the model, and the developer workflow. The new funding round signals that AI-assisted software engineering is moving from novelty to necessity, with tools increasingly expected to reason over entire codebases, manage iterative changes, and support autonomous refactoring. For engineering teams, this validates a broader shift: the future of IDEs won’t be defined by plugins that bolt AI onto legacy workflows, but by environments designed from the ground up around collaboration between humans and models. In parallel, large language models are entering a refinement phase. The release of GPT-5.1 is less about a shocking capabilities jump and more about making high-end models more controllable, more consistent, and more aligned with real workflows. The new variants emphasize better instruction following, adaptive reasoning—deciding when to think briefly versus in more depth—and richer persona control so that the model can respond in styles tuned to different products and audiences. The signal here is clear: for many real-world applications, the bottleneck is no longer “can the model do it?” but “can the model do it in a way that is predictable, steerable, and pleasant to use?” Speech and language coverage also took a meaningful step forward with the release of a large-scale omnilingual ASR system capable of transcribing an enormous number of languages, including many that have historically been neglected in machine learning benchmarks. By bringing hundreds of low-resource languages into a reasonable error range and offering an open ecosystem around the models and data, this kind of system materially lowers the barrier for building voice interfaces, accessibility tools, dubbing and captioning pipelines, and localized products for communities that have never really been served by mainstream AI. Overlaying all of this is a major human signal: Yann LeCun’s decision to leave Meta marks the end of one of the most influential tenures in industrial AI research. As Chief AI Scientist, he helped shape Meta’s long-term bet on self-supervised learning, championed the development and deployment of large-scale computer vision and recommendation systems, and pushed for a more open, research-centric culture through FAIR. His work inside Meta often provided a counterweight to the industry’s narrow focus on language-only models, emphasizing energy-efficient architectures, representation learning, and long-horizon autonomy. His departure closes a chapter in which Meta played a central role in advancing deep learning at scale, and it raises important questions about how the company will define its research identity going forward. Quite a week! Let’s dive into the details. 🔎 AI ResearchSIMA 2 – An agent that plays, reasons, and learns with you in virtual 3D worldsAI Lab: Google DeepMind / Google DeepMind Games & Agents team Summary: SIMA 2 is a Gemini-powered generalist agent that operates in commercial and AI-generated 3D games by “seeing” the screen and controlling a virtual keyboard and mouse, following natural-language, sketch, and emoji instructions. It goes beyond the original SIMA by setting its own goals, explaining its plans, generalising skills across games, and continuing to improve through self-play in both human-made and Genie-generated worlds. Understanding neural networks through sparse circuitsAI Lab: OpenAI Summary: OpenAI trains weight-sparse transformer models whose internal computations decompose into small, disentangled “circuits,” making it possible to isolate the subnetworks responsible for specific behaviors instead of reverse-engineering a dense tangle of weights. By scaling these sparse models, they show you can maintain strong capabilities while making internal mechanisms more transparent, pointing to a path for safer, more interpretable AI systems. o LeJEPA: Provable and Scalable Self-Supervised Learning Without the HeuristicsAI Lab: Meta FAIR; New York University (NYU); Brown University Summary: Introduces LeJEPA, a JEPA objective grounded in a proof that isotropic Gaussian embeddings minimize worst-case downstream risk, enforcing this via SIGReg—a sketched Epps–Pulley characteristic-function test over random 1-D projections—thus removing stop-gradients, teacher–student schemes, and other brittle heuristics with a single trade-off λ and linear time/memory. Across 60+ architectures (scaling to ~1.8B-parameter ViT-g) it trains stably, outperforms frontier transfers in in-domain settings, and its training loss closely tracks linear-probe accuracy, enabling label-free model selection. Llama-Embed-Nemotron-8B: A Universal Text Embedding Model for Multilingual and Cross-Lingual TasksAI Lab: NVIDIA Summary: Presents an open-weights 8B embedding model that ranks #1 on the MMTEB (multilingual) leaderboard via a bi-directional Llama-3.1 encoder, diverse synthetic+non-synthetic training, and model-merging. It delivers strong retrieval/STS/classification across 250+ languages and details ablations on loss design, SDG models, and merging. MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual ReasoningAI Lab: Microsoft Research India Summary: Introduces a planning-and-tools framework plus a vision-based critic that verifies answers against automatically generated criteria, boosting image/video QA. The agent outperforms strong MLLMs and tool pipelines on MMVET/MMMU and long-form video (EgoSchema), with the critic adding further gains. Black-Box On-Policy Distillation of Large Language ModelsAI Lab: Microsoft Research Summary: Introduces GAD (Generative Adversarial Distillation), a black-box, on-policy distillation framework where a student LLM learns via a discriminator that compares its outputs to a teacher’s, forming an adversarial minimax game. Experiments show GAD significantly outperforms standard sequence-level distillation, achieving strong generalization and enabling small open-source models to approach GPT-5-Chat performance. 🤖 AI Tech ReleasesMarbleWorld Labs launched Marble, its first text to 3D World model and its formidable. GPT 5.1OpenAI launched GPT-5.1, two new models that optimize the conversational experiences in ChatGPT. Omnilingual ASRMeta released a series of automatic speech recognition models that enable speech intelligence capabilities for over 1600 languages. ERNIE-4.5-VL-28B-A3B-ThinkingBaidu released RNIE-4.5-VL-28B-A3B-Thinking, a new moultimodal model optimized for reasoning workflows. JAX-Privacy 1.0Google open sourced JAX-Privacy, a library for differentially private AI. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Similar newsletters
There are other similar shared emails that you might be interested in:


