The Sequence Radar #803: Last Week in AI: Anthropic and OpenAI’s Battle for the Long Horizon, Goodfire and LayerLe…
Was this email forwarded to you? Sign up here Next Week in The Sequence:
Subscribe and don’t miss out:📝 Editorial: Last Week in AI: Anthropic and OpenAI’s Battle for the Long Horizon, Goodfire and LayerLens Push AI AccountabilityThe first week of February 2026 has marked a definitive shift in the artificial intelligence landscape, transitioning from models that merely converse to “agentic” systems that independently think, plan, and execute. This evolution is defined by a surge in autonomous capabilities and a critical new focus on the verification and interpretability of these increasingly complex digital minds. The Rise of the Agentic GiantsThe industry’s primary protagonists, OpenAI and Anthropic, have significantly raised the stakes with simultaneous releases focused on multi-step autonomy. OpenAI debuted GPT-5.3-Codex, a model that signals a new paradigm of self-improvement. OpenAI’s own engineers reportedly used early versions of the model to debug its training, manage deployment, and analyze test results—making it the first model instrumental in its own creation. Available through a dedicated application and CLI, the system behaves like a colleague that can handle projects stretching over days, allowing users to steer its behavior in real-time without losing context. Anthropic countered with Claude Opus 4.6, its most sophisticated model for professional work and agentic workflows to date. Introducing a landmark one-million-token context window, the model can ingest and reason across entire codebases or massive legal archives in a single pass. Its “adaptive thinking” protocol allows the model to autonomously determine when a task requires deeper reasoning, effectively managing its own cognitive resources to solve high-stakes problems with superior reliability. Decoding the Black BoxAs models grow more autonomous, the “black box” problem becomes a matter of systemic safety. Goodfire, an AI research lab dedicated to model interpretability, secured a $150 million Series B funding round this week at a $1.25 billion valuation. Goodfire’s work aims to transform AI from an opaque system into a transparent one that can be debugged like software. The lab’s platform, Ember, allows researchers to “map out” a model’s internal components and decode neurons to precisely shape behavior and reduce hallucinations. This interpretability-driven approach has already yielded scientific results, such as the discovery of a novel class of Alzheimer’s biomarkers by reverse-engineering an epigenetic foundation model—marking a major milestone for AI in the natural sciences. The Infrastructure of AccountabilityCrucially, as these agents move from guidance to execution, the industry is shifting toward more robust evaluation frameworks. The release of LayerLens‘s “agent-as-a-judge” ( I am a co-founder) capabilities exemplifies this new era of accountability. While previous evaluations relied on “vibes” or static multiple-choice tests, LayerLens focuses on enabling agentic evaluation capabilities that can verify complex, 50-step trajectories involving tool use and database interactions. By providing an independent oversight layer that analyzes the reasoning chains and execution artifacts of autonomous systems, LayerLens ensures that agents remain grounded in correctness and UX intent. This move toward “Evals 2.0” is essential for shipping code assistants and autonomous agents that users can truly trust in production environments. Shameless plug: follow LayerLens on X. We are witnessing the birth of a collaborative intelligence era. Between the agentic coding of OpenAI, the massive reasoning horizons of Anthropic, and the interpretability breakthroughs of Goodfire, the narrative has moved past “what can AI say” to “what can AI reliably do.” With frameworks like LayerLens providing the necessary oversight for this phase transition, the path is being cleared for AI to move from being a sophisticated tool to a verifiable, autonomous partner in the modern economy. 🔎 AI ResearchSWE-Universe: Scale Real-World Verifiable Environments to Millions
Kimi K2.5: Visual Agentic Intelligence
MARS: Modular Agent with Reflective Search for Automated AI Research
Likelihood-Based Reward Designs for General LLM Reasoning
ERNIE 5.0 Technical Report
When does predictive inverse dynamics outperform behavior cloning?
🤖 AI Tech ReleasesOpus 4.6Anthropic released Claude Opus 4.6, its marquee model that excels in coding tasks. GPT-5.3-CodexOpenAI released GPT-5.3-Codex, a new model specialized in agentic coding. FrontierOpenAI released Frontier, a new platform for AI agent managenent in enterprise environments. LayerLens Agent-as-a-JudgeLayerLens( I am a co-founder) launched a new set of AI evals capabilities based on agent-as-a-judge techniques. Qwen-Coder-NextThe Qwen team open sourced Qwen-Coder-Next, a new model designed for agentic coding and local development. Voxtral Transcribe 2Mistral released Vostral Transcribe 2, two new generation speech-to-text models. 📡AI RadarSapiom Raises $15M to Enable AI Agent Commerce
Fundamental Emerges from Stealth with $255M for Tabular AI
Resolve AI Reaches $1B Valuation for Autonomous SRE
ElevenLabs Secures $500M to Triple Valuation to $11B
Intel Enters GPU Market Targeting Data Center AI
Lotus Health Nabs $35M for Free AI-Driven Primary Care
Linq Secures $20M for Native AI Messaging Integration
Oracle Launches Blockbuster $20-25B Bond Sale for AI Cloud
Goodfire Valued at $1.25B to Advance AI Interpretability
Accrual Launches with $75M to Automate Accounting Workflows
Cerebras Systems Raises $1B at $23B Valuation
VMS Group Backs $100M AI Infrastructure Fund in Hong Kong
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Similar newsletters
There are other similar shared emails that you might be interested in:


