This Week in Turing Post: |
|
|
π€ From our partners: Implement an identity framework for securing AI agents |
|
AI agents are shipping fast β and breaking core security assumptions. Agentic workflows introduce anonymous execution, credential sprawl, excessive privilege, poor auditability, and brittle controls. Join Teleport to unpack why legacy identity fails for agentic AI and what AI-ready infrastructure actually requires. |
|
Our news digest is always free. Click on the partnerβs link above to support us. Upgrade to receive our deep dives in full, directly into your inbox. Join Premium members from top companies like Nvidia, Hugging Face, Microsoft, Google, a16z etc plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand whatβs going on with AI β | |
|
|
What an insane week: Claude and ChatGPT launches, markets spiraling down, and an overpacked Clawdbot meetup in SF (check βThe News From Usual Suspectsβ section). But what really caught my attention was the future painted by Elon Musk: |
Living Inside Kardashevβs Head |
On February 2, 2026, SpaceX published an update announcing that xAI had joined SpaceX. Buried inside the announcement was a line that would have sounded absurd even five years ago: this merger, the company said, is a first step toward becoming a Kardashev Type II civilization. |
Pause here for a second. |
A Soviet astrophysicist working in the 1960s, in the middle of the Cold War, when radio astronomy and SETI were still young, thinking about extraterrestrial intelligence, has become a reference point for a real capital allocation plan in 2026. Kardashev was a brilliant physicist, no doubt, but much of his framework was necessarily speculative. Well, we are not in theory anymore: we are watching rockets fly, satellites launch, factories expand, and grid demand spike, all while Kardashev is invoked as if he were an internal strategy memo. What a peculiar turn of events! |
What Kardashev Meant |
Kardashev was not trying to predict the future of humanity. He was trying to solve a detection problem. If advanced civilizations exist, how would we notice them? He thought: look for energy. A civilization capable of large-scale engineering will leave thermodynamic footprints. Waste heat, infrared glow, star-scale manipulation. |
He proposed a simple classification: |
Type I civilizations harness planetary-scale energy. Type II harness the energy of their star. Type III operate on galactic scales.
|
For decades, the Kardashev scale lived comfortably in the sci-fi and SETI corner because nothing we were building looked remotely relevant. Our technologies were clever, but light. Software-heavy, energy-light. |
Not that anyone expected that β but AI changed that equation. |
Intelligence Has Grown a Power Bill |
The SpaceX update makes a simple claim, almost in passing: current advances in AI depend on large terrestrial data centers, and global electricity demand for AI cannot be met without imposing hardship on communities and the environment. |
Taken at face value, this is an admission that intelligence has become infrastructure. It consumes electricity at scale and competes with households, cities, and industry for grid capacity. |
Once a technology reaches this stage, progress is no longer gated by ideas alone. It becomes gated by permitting, supply chains, land, and energy. |
What βMoving Toward Type IIβ Means in Practice |
Freeman Dyson, an American physicist, speculated that a sufficiently advanced civilization might capture stellar energy by building a vast structure around its star. The image of a βDyson sphereβ stuck, and with it the impression that using solar-scale energy requires fantastical megastructures. |
We are not building that. |
Moving toward Type II, today, means three very specific things: |
First, energy becomes the limiting factor for intelligence. Access to cheap, continuous power at scale β thatβs what matters most. This is why AI shows up in utility forecasts, transformer shortages, and regional politics. Once intelligence hits the grid, the grid pushes back. Second, the geometry of infrastructure starts to matter. On Earth, energy is seasonal, regulated, land-constrained, and socially contested. In orbit, solar power is near-constant and space is abundant. βItβs always sunny in space!β changes where the bottleneck lives. Third, logistics replaces invention as the hard problem. Starship matters less because it can reach Mars and more because it is meant to move mass repeatedly, cheaply, and on schedule. That changes what is possible. A civilization does not move toward Type II by inventing one breakthrough device (or coding platform). It moves there by building systems that can move material and energy at scale, over and over again, without stopping.
|
Seen this way, Starlink, Starship, xAI, and orbital compute form a coherent story: intelligence demands energy, energy demands infrastructure, and infrastructure demands scale that Earth increasingly struggles to absorb. |
The Uncomfortable Part Kardashev Never Addressed |
Kardashev gave us a ruler, but he never really thought about governance. After all, he lived in the Soviet Union, and assumed, I guess, that the USSR would be in control. And that raises a few big questions. If intelligence becomes an energy-intensive utility, then control over energy-to-compute pipelines becomes control over agency. Vertical integration stops being a business strategy and starts becoming a civilizational lever. |
The scale does not tell us who should own that substrate, how access should be governed, or how tradeoffs between growth and environmental stability should be handled. It only tells us that capability tracks energy. |
That is why invoking Kardashev today is both clarifying and unsettling. It reframes progress in physical terms, but it also exposes how little social machinery we have built around that reality. |
Why This Moment Feels Surreal |
Kardashev thought his scale would help us notice aliens. |
Instead, it is helping us notice ourselves. |
Itβs almost shocking that his core assumption β that civilization advances by commanding more energy β has reasserted itself as a practical constraint of modern AI. |
And the real question is no longer whether Kardashev was right, but whether we are prepared for what it means to organize intelligence, infrastructure, and power on that scale without losing control of the systems we are building. Is it looking too far into the future? I no longer know. |
But everything that we see correlates with the trend that research papers also show (see the Research paper category), itβs not about a model anymore, itβs about systems. About energy, throughput, memory, data movement, deployment surfaces, and long-lived infrastructure that sits underneath intelligence and shapes what it can actually do. |
We are watching a shift from optimizing architectures to organizing capacity. |
|
|
We are watching/reading: |
|
 | When AI Agents Start Hiring Humans: The Meatspace Layer Explained |
|
|
|
|
News from the usual suspects |
Claude Opus 4.6 in Claude code vs OpenAI GPT-5.3-Codex = people canβt decide whatβs better |
Claude Opus 4.6, Incrementally Better Anthropic launched Claude Opus, an update focused on more consistent reasoning, improved tool use, and better performance on long-context tasks. The release avoids bold claims and flashy benchmarks, instead emphasizing reliability and steady progress. It fits Anthropicβs broader pattern: iterate carefully, prioritize trust, and let adoption do the talking. The most interesting case so far: Building a C compiler with a team of parallel Claudes GPT-5.3-Codex Expands the Scope of Codex OpenAI introduced GPT-5.3-Codex, an updated model that combines improved coding performance with broader agentic and professional task support. The release focuses on longer-running tasks, better tool use, and more reliable computer interaction, positioning Codex as something closer to a general work agent than a coding assistant. OpenAI also emphasized internal use, noting material changes in how its own teams operate.
|
More from OpenAI |
ChatGPT Tests Ads, Promises a Firewall OpenAI began testing ads in ChatGPT for logged-in adult users in the U.S. on the Free and Go tiers. Paid tiers (Plus/Pro/Business/Enterprise/Education) stay ad-free. OpenAI says ads are labeled, kept separate from answers, and do not affect responses; advertisers get only aggregate performance data. Users can manage personalization and delete ad data. OpenAI Goes Agent-First, on Purpose In a widely circulated post, OpenAI president Greg Brockman outlined an internal shift toward agentic software development. The goal: agents as the default interface for technical work, replacing editors and terminals where possible. The guidance is notably operational β roles, documentation, infra, and accountability β suggesting this is less a vision statement than an execution plan.
|
More from Anthropic |
Agentic Coding Grows Up A new 2026 Agentic Coding Trends Report argues that software development is shifting from writing code to orchestrating agents. The report highlights coordinated multi-agent systems, long-running agents, and scaled human oversight as the real levers of change. The message is restrained: productivity gains are real, but durable advantage comes from structure, supervision, and security β not full automation. Anthropic Triggers a Market Repricing Anthropicβs release of Claude Opus 4.6 and its broader push toward long-running, agentic coding systems prompted a sharp selloff across publicly traded AI tooling and dev-infrastructure companies. Investors reacted less to raw benchmarks than to pricing pressure and the implication that large labs are moving directly into territory once reserved for startups. The move forced a fast reassessment of defensibility across the AI software stack.
|
Cursor Experiments With Self-Driving Codebases Cursor published detailed research on running large numbers of autonomous coding agents continuously, showing how thousands of agents can coordinate to maintain and evolve a codebase with limited human oversight. The work focuses less on model capability and more on system design: roles, delegation, error tolerance, and throughput. The takeaway is pragmatic β autonomy works, but only with careful structure and clear intent. |
|
|
π¦ Paper Highlight |
π First proof (π) |
|
Researchers from Stanford University, Columbia University, EPFL, Imperial College, Yale University, Harvard University, and other institutions propose a methodology to evaluate LLMs on genuine research-level mathematics. They release ten unpublished math questions spanning algebra, topology, analysis, and numerical linear algebra, each solvable with short proofs unknown online. Answers are encrypted temporarily to prevent data contamination. Initial one-shot tests show frontier AI systems struggle, motivating development of a future benchmark βread the paper |
Foundation Models Tech Report |
Model Tech Report: Kimi K2.5: Visual Agentic Intelligence Integrates joint textβvision pretraining and reinforcement learning with parallel agent orchestration to enable scalable multimodal agentic intelligence βread the paper ERNIE 5.0 Technical Report Trains a unified autoregressive multimodal foundation model with elastic ultra-sparse MoE routing to support flexible deployment across scale and resource constraints βread the paper
|
Research this week |
(as always, π indicates papers that we recommend to pay attention to) |
This week is about turning intelligence into infrastructure: |
Agents are becoming population-based and modular RL is becoming data-scalable and behavior-aware Memory, attention, and retrieval are being treated as policies SWE and GUI are the real stress tests Systems work is setting the ceiling for everything else
|
Reinforcement learning, post-training, and alignment mechanics |
βοΈ Golden Goose: Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text One of the most strategically important RL papers right now. It breaks the data bottleneck for RLVR by exploiting unverifiable text at scale β read the paper βοΈ Reinforced Attention Learning Shifts optimization from tokens to attention distributions. This is a real conceptual step forward for multimodal post-training β read the paper βοΈ Rethinking the Trust Region in LLM Reinforcement Learning Argues PPO-style clipping is structurally wrong for LLMs and replaces it with divergence-based constraints. This will age well β read the paper βοΈ GRP-Obliteration: Unaligning LLMs With a Single Unlabeled Prompt Shows that post-training safety alignment can be reliably undone using GRPO with minimal supervision, while largely preserving model utility. Important because it treats alignment as reversible behavior, not a stable property, and uses the same RL machinery the field relies on for capability gains β read the paper F-GRPO: Donβt Let Your Policy Learn the Obvious and Forget the Rare Fixes rare-solution collapse in group-based RL. A clean, incremental improvement with real gains β read the paper SLIME: Stabilized Likelihood Implicit Margin Enforcement Addresses unlearning and formatting collapse in preference optimization. Solid alignment hygiene work β read the paper Self-Hinting Language Models Enhance Reinforcement Learning Uses privileged hints during training to prevent GRPO collapse, then removes them at test time. Clever and practical β read the paper Good SFT Optimizes for SFT, Better SFT Prepares for RL Important reminder that SFT quality should be judged by downstream RL performance, not standalone metrics β read the paper On the Entropy Dynamics in Reinforcement Fine-Tuning of LLMs Theory-heavy but useful for understanding why entropy control methods behave the way they do β read the paper
|
Agentic systems, self-improvement, orchestration |
βοΈ Group-Evolving Agents: Open-Ended Self-Improvement via Experience Sharing Group-level evolution beats tree-style self-evolution by actually reusing exploratory diversity. One of the clearest signals that agent learning is shifting from βsingle mindβ to βpopulation dynamicsβ β read the paper βοΈ AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration Formalizes agents as composable tuples and treats sub-agents as dynamically instantiated tools. This is quietly one of the most practical orchestration abstractions this year β read the paper βοΈ MARS: Modular Agent with Reflective Search for Automated AI Research Budget-aware planning + reflective memory for research agents. Important because it treats research as a cost-constrained search problem, not a prompt-engineering task β read the paper WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent RL Argues that width, not depth, is the right scaling axis for broad search. Strong empirical signal that parallelism beats ever-longer chains β read the paper daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently Uses real-world PR sequences as supervision for long-horizon agency. Interesting mainly as a data lens, less as a general framework β read the paper MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents Treats memory operations as learnable skills that themselves evolve. Fits the broader shift toward memory-as-policy β read the paper RE-TRAC: Recursive Trajectory Compression for Deep Search Agents Cross-trajectory reflection instead of linear ReAct loops. A clean fix for local-optimum collapse in deep research agents β read the paper
|
Software engineering agents and verifiable environments |
βοΈ SWE-Universe: Scale Real-World Verifiable Environments to Millions One of the most important infrastructure papers of the week. Million-scale verifiable SWE environments changes what mid-training and RL can even mean for coding agents β read the paper βοΈ SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training A transparent, end-to-end recipe for building strong SWE agents. Valuable because itβs reproducible and explicit about the full pipeline β read the paper MEnvAgent: Scalable Polyglot Environment Construction for Verifiable SWE Solves the multi-language environment bottleneck. Less flashy, but very necessary if SWE agents are to generalize beyond Python β read the paper SWE-World: Building Software Engineering Agents in Docker-Free Environments Replaces real execution with learned surrogates. Important mainly for cost and scalability tradeoffs β read the paper Closing the Loop: Universal Repository Representation with RPG-Encoder Treats repo comprehension and generation as inverse processes. Strong representation idea that complements SWE agents nicely β read the paper
|
World models, reasoning, and long-horizon cognition |
βοΈ Reinforcement World Model Learning for LLM-based Agents Aligns simulated and real next states instead of predicting tokens. This is a strong move away from brittle next-token world models β read the paper Self-Improving World Modelling with Latent Actions (SWIRL) Learns world models without action labels by treating actions as latent. Conceptually elegant and broadly applicable β read the paper InftyThink+: Infinite-Horizon Reasoning via RL Optimizes when and how to summarize reasoning, not just how long to think. Good evidence that CoT scaling needs structure β read the paper No Global Plan in Chain-of-Thought Shows LLMs plan locally, not globally. Useful as a diagnostic lens rather than a training recipe β read the paper Research on World Models Is Not Merely Injecting World Knowledge A meta-paper, but an important one. Argues for world models as unified systems, not task-specific hacks β read the paper
|
Multimodality, GUI agents, and perception-control loops |
βοΈ POINTS-GUI-G: GUI-Grounding Journey One of the clearest demonstrations that RL works extremely well for perception-heavy tasks when rewards are verifiable β read the paper Generative Visual Code Mobile World Models Predicts GUI states as executable code instead of pixels. Very strong idea for mobile and UI agents β read the paper Training Data Efficiency in Multimodal Process Reward Models Shows most MPRM data is redundant and how to select informative subsets cheaply β read the paper
|
Model architecture, efficiency, and scaling |
βοΈ Horizon-LM: A RAM-Centric Architecture for LLM Training Redefines the CPUβGPU boundary and makes 100B+ training feasible on a single node. This is a serious systems contribution β read the paper OmniMoE: Atomic Experts at Scale Pushes MoE granularity to the extreme while fixing the systems bottlenecks. Strong systemβalgorithm co-design β read the paper HySparse: Hybrid Sparse Attention with KV Cache Sharing Uses full attention as an oracle and reuses KV cache. Very clean design, very practical β read the paper OmniSIFT: Modality-Asymmetric Token Compression One of the better token-compression papers for omni-modal models, with real latency wins β read the paper FASA: Frequency-aware Sparse Attention Discovers functional sparsity in RoPE frequencies. Elegant and surprisingly effective β read the paper
|
Thatβs all for today. Thank you for reading! Please send this newsletter to colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. |
How did you like it? |
|
|