The Sequence Radar #554 : The New DeepSeek R1-0528 is Very Impressive
Was this email forwarded to you? Sign up here The Sequence Radar #554 : The New DeepSeek R1-0528 is Very ImpressiveThe new model excels at math and reasoning.Next Week in The Sequence:In our series about evals, we discuss multiturn benchmarks. The engineering section dives into the amazing Anthropic Circuits for ML interpretability. In research, we discuss some of UC Berkeley’s recent work in LLM reasoning. Our opinion section dives into the state of AI interpretablity. You can subscribe to The Sequence below:📝 Editorial: The New DeepSeek R1-0528 is Very ImpressiveThis week, DeepSeek AI pushed the boundaries of open-source language modeling with the release of DeepSeek R1-0528. Building on the foundation of the original R1 release, this update delivers notable gains in mathematical reasoning, code generation, and long-context understanding. With improvements derived from enhanced optimization and post-training fine-tuning, R1-0528 marks a critical step toward closing the performance gap between open models and their proprietary counterparts like GPT-4 and Gemini 1.5. At its core, DeepSeek R1-0528 preserves the powerful 672B Mixture-of-Experts (MoE) architecture, activating 37B parameters per forward pass. This architecture delivers high-capacity performance while optimizing for efficiency, especially in inference settings. One standout feature is its support for 64K-token context windows, enabling the model to engage with substantially larger inputs—ideal for technical documents, structured reasoning chains, and multi-step planning. In terms of capability uplift, the model shows remarkable progress in competitive benchmarks. On AIME 2025, DeepSeek R1-0528 jumped from 70% to an impressive 87.5%, showcasing an increasingly sophisticated ability to tackle complex mathematical problems. This leap highlights not just better fine-tuning, but a fundamental improvement in reasoning depth—an essential metric for models serving scientific, technical, and educational use cases. For software engineering and development workflows, R1-0528 brings meaningful updates. Accuracy on LiveCodeBench rose from 63.5% to 73.3%, confirming improvements in structured code synthesis. The inclusion of JSON-formatted outputs and native function calling support positions the model as a strong candidate for integration into automated pipelines, copilots, and tool-augmented environments where structured outputs are non-negotiable. To ensure broad accessibility, DeepSeek also launched a distilled variant: R1-0528-Qwen3-8B. Despite its smaller footprint, this model surpasses Qwen3-8B on AIME 2024 by over 10%, while rivaling much larger competitors like Qwen3-235B-thinking. This reflects DeepSeek's commitment to democratizing frontier performance, enabling developers and researchers with constrained compute resources to access state-of-the-art capabilities. DeepSeek R1-0528 is more than just a model upgrade—it's a statement. In an ecosystem increasingly dominated by closed systems, DeepSeek continues to advance the case for open, high-performance AI. By combining transparent research practices, scalable deployment options, and world-class performance, R1-0528 signals a future where cutting-edge AI remains accessible to the entire community—not just a privileged few. Join Me for a Chat About AI Evals and Benchmarks:🔎 AI ResearchFLEX-Judge: THINK ONCE, JUDGE ANYWHERELab: KAIST AI Learning to Reason without External RewardsLab: UC Berkeley & Yale University Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM ReasoningAI Lab: Google DeepMind & Northwestern University rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified DatasetAI Lab: Microsoft Research Asia MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMsAI Lab: Fudan University, CUHK MMLab, Shanghai AI Lab DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep ResearchAI Lab: Carnegie Mellon University, NOVA LINCS, INESC-ID Fine-Tuning Large Language Models with User-Level Differential PrivacyAI Lab: Google Research 🤖 AI Tech ReleasesDeepSeek-R1-0528DeeSeek released a new version of its marquee R1 model. Anthropic CircuitsAnthropic open sourced its circuit interpretability technology. Perplexity LabsPerplexity released a new tool that can generate charts, spreadsheets and dashboards. Codestral EmbedMistral released Codestral Embed, an embedding model specialized in coding. 🛠 AI in ProductionMulti-Task Learning at NetflixNetflix shared some details about its multi-task prediction strategy for user intent. 📡AI Radar
You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities. |
Similar newsletters
There are other similar shared emails that you might be interested in: