The Sequence Radar #723: Alibaba’s Agentic Leap: Why Tongyi DeepResearch Matters

From:

TheSequence <thesequence@substack.com>

To:

Hidden Recipient <hidden@emailshot.io>

Date:

9/21/2025, 11:02 AM

͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏ ͏

Forwarded this email? Subscribe here for more

Was this email forwarded to you? Sign up here

The Sequence Radar #723: Alibaba’s Agentic Leap: Why Tongyi DeepResearch Matters

Another Chinese lab releasing impressive models.

Sep 21

READ IN APP

Next Week in The Sequence:

Subscribe Now to Not Miss Anything:

📝 Editorial: Alibaba’s Agentic Leap: Why Tongyi DeepResearch Matters

Tongyi DeepResearch matters because it’s the first fully open‑source “deep research” web agent that publicly claims parity with top closed systems across a broad suite of agentic browsing benchmarks—yet ships under a permissive license with reproducible code and weights. For teams that need verifiable pipelines and on‑prem deployment, this flips the script: the research loop (data → training → evaluation → inference) is documented end‑to‑end and legally usable in products, not just demos. The release also raises the bar on what “agentic” means in practice: robust long‑horizon browsing, test‑time scaling, and an RL‑trained policy rather than fragile prompt glue.

Quick background on Tongyi: the project comes from Alibaba’s Tongyi Lab (adjacent to the Qwen stack) and is derived from a 30.5B Mixture‑of‑Experts architecture with ~3.3B parameters active per token (“A3B”), giving it the efficiency profile of a small model while keeping a larger expert pool for reasoning. Context length is listed at 128k. The weights and training/inference code are available under Apache‑2.0 with ready‑to‑run scripts. If you’ve used Qwen3‑30B‑A3B models, the ergonomics will feel familiar; this is a specialized agentic fork aimed at long‑horizon information seeking.

Technically, the standout is the fully automated synthetic‑data flywheel that spans continual pre‑training (Agentic CPT), supervised fine‑tuning, and strictly on‑policy reinforcement learning. The team describes a Group‑Relative Policy Optimization variant with token‑level policy gradients and leave‑one‑out advantages to stabilize training in a non‑stationary web environment—paired with automated negative‑sample filtering. Inference supports two regimes: a vanilla ReAct path to audit core capabilities, and a “Heavy” mode (test‑time scaling) that layers iterative planning to push performance ceilings. This combination—purpose‑built synthetic data + on‑policy RL + selectable inference regimes—is the core engineering contribution.

On empirical results, Tongyi DeepResearch reports state‑of‑the‑art or parity scores on major agentic browsing suites: 32.9 on Humanity’s Last Exam (HLE), 43.4 on BrowseComp, 46.7 on BrowseComp‑ZH, and 75 on xBench‑DeepSearch, with additional wins across WebWalkerQA and related sets. The claim is that it systematically outperforms existing proprietary and open‑source “deep research” agents in the reported settings. As always, caveats apply—benchmarks vary in tooling, retries, and orchestration—but the breadth of public numbers and open weights make third‑party replication feasible.

Design‑wise, the rollout loop emphasizes “synthesis and reconstruction”: after each browsing cycle the agent distills essential artifacts into a compact workspace and a continually evolving central report before deciding to gather more evidence or finalize an answer. Beyond benchmarks, Alibaba lists live deployments—e.g., “Xiao Gao” in Amap (Gaode) for multi‑step travel planning and a legal research agent (FaRui) that grounds outputs in verifiable citations—use cases that stress tooling orchestration and citation hygiene, not just token‑level reasoning.

Why this release is significant: it operationalizes an open, reproducible recipe for long‑horizon agents—data generation → training → RL → inference—that enterprises can inspect, fork, and harden, rather than treating agents as a prompt template on top of a closed API. The Apache‑2.0 licensing and full‑stack availability lower adoption friction; the MoE‑A3B efficiency makes serious research loops economically plausible; and the explicit limitations (context still capped, scaling to larger backbones pending, RL efficiency to improve) give a credible roadmap for community contributions. In short, Tongyi DeepResearch resets expectations for what a “serious” open agent looks like—and gives practitioners something they can run, measure, and ship today.

🔎 AI Research

Title: The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

AI Lab: University of Cambridge; Institute for AI, University of Stuttgart; Max Planck Institute for Intelligent Systems; ELLIS Institute; University of Southampton; Tübingen AI Center.

Summary: The paper shows that small gains in single-step accuracy compound into large—and even faster-than-exponential—improvements in the task length models can execute, and identifies “self-conditioning” (models amplifying their own past mistakes) as a key failure mode in long-horizon execution. Thinking models and test-time sequential compute mitigate self-conditioning and dramatically extend single-turn execution length, with frontier reasoning models outperforming non-thinking counterparts by large margins.

Title: Virtual Agent Economies

AI Lab: Google DeepMind (with contributors from University of Toronto).

Summary: The authors propose “sandbox economies” for AI agents—intentional, steerable markets with controllable permeability to the human economy—to harness coordination benefits while managing systemic risk. They outline design tools such as auctions for fair allocation, mission economies, and trust infrastructure (e.g., verifiable credentials) to build safe, accountable, and socially aligned agent markets.

Title: Towards General Agentic Intelligence via Environment Scaling

AI Lab: Tongyi Lab, Alibaba Group.

Summary: The paper introduces a scalable pipeline that programmatically builds fully simulated tool-use environments and then trains agents via a two-stage experience-learning regimen (general tool use → domain specialization), yielding verifiable trajectories. Experiments on τ-bench/τ²-Bench/ACEBench show the AgentScaler models significantly improve function-calling capability and reach parity with much larger/closed models in several cases.

Title: WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning

AI Lab: Tongyi Lab, Alibaba Group.

Summary: WebSailor-V2 pairs a new SailorFog-QA-2 dataset (dense graph-based uncertainties) with a dual-environment RL setup (fast simulator + robust managed real web) to train a 30B-A3B MOE agent. The system achieves SOTA on BrowseComp-EN/ZH and HLE, surpassing prior open-source agents and rivaling proprietary systems.

Title: Scaling Laws for Differentially Private Language Models

AI Lab: Google Research & Google DeepMind.

Summary: This work derives compute–privacy–utility scaling laws for DP LLM training, providing prescriptions for how to allocate compute among model size, batch size, and iterations under fixed privacy/data budgets. A key finding is that DP-optimal configurations favor much smaller models and very large batches, with increasing compute yielding little benefit unless accompanied by more privacy budget or data.

Title: Tool-space interference in the MCP era: Designing for agent compatibility at scale

AI Lab: Microsoft Research

Summary: The blog highlights how the rapid adoption of the Model Context Protocol (MCP) has created a thriving ecosystem of interoperable tools, but also introduced “tool-space interference,” where multiple agents and tools working together can inadvertently reduce effectiveness. Microsoft researchers propose early design strategies to mitigate these issues, enabling heterogeneous agents to cooperate at scale rather than hinder one another.

🤖 AI Tech Releases

Tongyi DeepResearch

Alibaba Tongyi open sourced a new autonomous research agent.

IBM Granite Docling

IBM released Granite Docling, a foundation model optimized for document understanding.

📡AI Radar

NVIDIA will invest $5B in Intel and the two will co-develop multiple generations of custom data-center and PC products (including x86 SoCs with RTX GPU chiplets and NVLink integration).
Google DeepMind reported Gemini 2.5 Deep Think achieved gold-medal–level performance at ICPC World Finals, underscoring rapid gains in competitive coding by frontier models.
Huawei used HUAWEI CONNECT 2025 to unveil new Ascend-based SuperPoDs/SuperClusters and an open-access SuperPoD architecture, signaling a harder push into Nvidia-class AI compute.
Meta introduced Ray-Ban “Display” smart glasses with an in-lens HUD and an EMG wristband (Meta Neural Band), starting at $799 on Sept 30 in select U.S. stores.
Groq raised $750M at a $6.9B valuation to scale its LPU inference infrastructure and global data-center footprint. (Groq)
Irregular raised $80M to build a frontier-model security lab and tooling aimed at red-teaming and hardening advanced AI systems.
Amazon rolled out an agentic, Bedrock-powered Seller Assistant that acts as an always-on ops helper for marketplace businesses.
CodeRabbit closed a $60M Series B to expand AI code-review “quality gates” across PRs, IDEs, and the CLI.
MarqVision secured $48M (Series B) to grow its AI-powered brand protection and IP enforcement platform.
Atlassian announced a definitive agreement to acquire DX for ~$1B to bring “engineering intelligence” deeper into its developer stack.

You’re on the free list for TheSequence Scope and TheSequence Chat. For the full experience, become a paying subscriber to TheSequence Edge. Trusted by thousands of subscribers from the leading AI labs and universities.

Comment

Restack

Similar newsletters

There are other similar shared emails that you might be interested in: