In partnership with |  |
|
This Week in Turing Post: |
Wednesday / AI 101 series: Guardian Models Friday / Interview: Ulrik Stig Hansen, co-founder of Encord
|
Our news digest is always free. Click on the partnerβs link to support us or Upgrade to receive our deep dives in full, directly into your inbox. Join Premium members from top companies like Hugging Face, Microsoft, Google, a16z, Datadog plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand whatβs going on with AI β | |
|
|
Donβt stop at the editorial β the research papers are phenomenal this week |
Now, to the main topic: How People Use ChatGPT |
This Monday, I had a different topic in mind β two, actually. I was debating whether to cover casual attention (casual AI is something I follow diligently) or the state of hallucinations (two great papers dropped last week). But suddenly OpenAI started their Monday publishing: a 63-page report with hard numbers about how people use ChatGPT. |
First, I got excited for some insights. I actually printed it out so I could use my pink pencil to underline the most interesting things. Then I read it, highlighting the parts that caught my attention. And then I used ChatGPT to clarify these things and see if I was right in my questions. |
The mystery is why the researchers behind this report didnβt read or at least use ChatGPT to check it themselves. Mightβve saved them a few embarrassing moments. |
suggested prompt: i have a few doubts about this report, what inconsistencies/inaccuracies/faults can you spot?
|
There are indeed a bunch of inconsistencies scattered around β most of them not catastrophic, but each one nicks at the credibility. And then thereβs a bigger flaw that undermines the whole report. |
Let me demonstrate: |
They repeat throughout the report, in different words, the following: βAs of July 2025 about 70% of ChatGPT consumer queries were unrelated to work; while both work-related and non-work-related queries have been increasing, non-work queries have been increasing faster.β |
But then thereβs also a footnote: βOur sample includes the three consumer plans (Free, Plus, or Pro). OpenAI also offers a variety of other ChatGPT plans (Business fka. Teams, Enterprise, Education), which we do not include in our sample.β |
If you look at it strictly as a consumer usage report β then yes, it makes sense they cut out Teams, Business, Enterprise, and Education accounts. Those are not βconsumer plans,β theyβre workplace products. So the paper isnβt wrong for excluding them. But then how can you make any conclusions about work vs non-work usage?! Itβs like writing a report on how people eat pizza β and then only counting take-out orders from Dominoβs, while leaving out every slice eaten in restaurants, at school cafeterias, or office parties. |
Where it gets confusing is in the framing. The title and conclusion position it as How People Use ChatGPT β full stop β when in fact itβs really How Consumers Use ChatGPT. That missing qualifier changes how you read the findings: |
β70% of usage is non-workβ is true for Free/Plus/Pro users, but you canβt generalize that to all usage when a giant slice of the pie β enterprise accounts where work dominates β is off the table. The βwork vs non-workβ trend is real within consumer accounts, but doesnβt tell us whatβs happening in offices, classrooms, or enterprise workflows. Because they also use ir for work and non-work.
|
So: |
If the researchers had just titled it How Consumers Use ChatGPT, no problem. Because they didnβt, the report risks being quoted as βproofβ that ChatGPT is mostly non-work everywhere, which isnβt supported by their own sampling choices.
|
They say: βWhile most economic analysis of AI has focused on its impact on productivity in paid work, the impact on activity outside of work (home production) is on a similar scale and possibly larger.β |
If they claim that, then they have to actually do the comparative work analysis to justify it. Otherwise, the comparison collapses into hand-waving. |
They also say: βThe fact that non-work usage is increasing faster suggests that the welfare gains from generative AI usage could be substantial.β |
I donβt know why itβs so important for them to hammer that point, but it falls apart. And it got me all pumped up about it because if you have millions of users and millions of readers, you have to be responsible for what you say. When itβs this sloppy, itβs just painful and raises questions to credibility. |
*sigh They also launched Codex today, though. Read about it below. Itβs Top. |
Ad moment (click to support us): |
How Canva, Perplexity and Notion turn feedback chaos into actionable customer intelligence |
|
Support tickets, reviews, and survey responses pile up faster than you can read. |
Enterpret unifies all feedback, auto-tags themes, and ties insights to revenue, CSAT, and NPS, helping product teams find high-impact opportunities. |
β Canva: created VoC dashboards that aligned all teams on top issues. β Perplexity: set up an AI agent that caught revenueβimpacting issues, cutting diagnosis time by hours. β Notion: generated monthly user insights reports 70% faster. |
Stop manually tagging feedback in spreadsheets. Keep all customer interactions in one hub and turn them into clear priorities that drive roadmap, retention, and revenue. |
Get a personalized demo |
Links from the editorial: |
|
|
|
After doing 3 Wow and 1 Promise for a few weeks, we asked ourselves: Who really needs more AI news? With so much out there, attention gets stretched too thin. What matters is holding focus on the things that shape the long-term horizon. |
Introducing Attention Span β starting with an explanation behind something new (and the first paper) from Thinking Machines Lab. Watch it here β |
 | Nondeterminism in LLMs Explained: Why Outputs Drift |
|
|
|
|
News from The Usual Suspects Β© |
Also OpenAI today: GPT-5 Codex β from code suggestions to coding agents, with no waste of tokens |
 | Greg Brockman @gdb |  |
| |
GPT-5-Codex β big improvement for long-running agentic tasks: | OpenAI @OpenAI
Weβre releasing GPT-5-Codex β a version of GPT-5 further optimized for agentic coding in Codex. Available in the Codex CLI, IDE Extension, web, mobile, and for code reviews in Github. openai.com/index/introducβ¦ |
| | 5:20 PM β’ Sep 15, 2025 | | | | 973 Likes 43 Retweets | 57 Replies |
|
|
Some developers complain that Codex feels longer (though smarter) than Claude Code β but thatβs actually the whole point. Codex has been trained to spend its effort where it matters. It doesnβt waste tokens on trivial autocomplete tasks; it answers those quickly. But when the problem is harder it slows down, reasons harder, and works longer. Itβs by design, and itβs a very interesting feature! |
|
Anthropicβs MCP goes public The MCP Registry has landed β an open catalog and API for discovering publicly available MCP servers. Itβs designed as a single source of truth, enabling both public and private sub-registries to thrive without stepping on toes. With a community-moderated model and open-source foundation, itβs a foundational step toward scaling context-aware AI. A quiet launch, but one with deep roots and broad ambitions. Oracleβs Loud Pivot After a decade of quiet infrastructure work, Oracle just shouted its way into the AI big leagues. With a record-setting compute deal in the works and AI demand visibly swelling its backlog, Oracle looks less like a dusty database vendor and more like the connective tissue of enterprise AI. It skipped the model arms race and built the rails β data, governance, and distribution β for others to ride. Devin Goes to Eleven Cognition AI, the team behind coding agent Devin, just raised $400M at a $10.2B valuation β up from $4B earlier this year. With ARR jumping from $1M to $73M in under a year and net burn under $20M, the numbers are as aggressive as the company culture. Long hours, layoffs, and buyouts havenβt scared investors β or slowed growth. Itβs a hyperloop ride in both valuation and velocity. We just published a super detail deep dive about them βread it here
|
We are reading/watching |
|
Models to pay attention to |
VaultGemma β train a 1B decoder-only Gemma variant fully under differential privacy, demonstrate practical DP scaling laws, and release open weights for privacy-preserving applications βread the paper (pdf) Hunyuan-MT / Hunyuan-MT-Chimera β build multilingual translation models across 33 languages and aggregate multi-setting outputs at test time to boost robustness, achieving state-of-the-art WMT2025 performance βread the paper mmBERT β pretrain a modern multilingual encoder on 3T tokens with annealed language learning to lift classification and retrieval in both high- and low-resource languages βread the paper Qwen3-Next β combine gated DeltaNet and gated attention with an ultra-sparse MoE and native multi-token prediction to deliver long-context efficiency while activating ~3B of 80B parameters βread the paper
|
Interesting surveys |
|
 | Image Credit: The original paper |
|
Reinforcement learning foundations for deep research systems: A survey Researchers from Huawei Technologies surveyed RL approaches for training deep research systems with hierarchical agents. They examined data synthesis methods like cross-document and obfuscated queries, RL techniques for long-horizon credit assignment, reward design, and multimodal reasoning, and frameworks such as GRPO and DUPO. The survey highlights system bottlenecks, coordination strategies, and benchmarks, offering a roadmap for building scalable, tool-using, and evaluation-ready agentic research systems βread the paper
|
The freshest research papers, categorized for your convenience |
We organize research papers by goal-oriented or functional categories to make it easier to explore related developments and compare approaches. As always, papers we particularly recommend are marked with π |
Agents, Tools & Environments |
π Tool-space interference in the MCP era: Designing for agent compatibility at scale (Microsoft) β analyze how tool catalogs interact in the Model Context Protocol ecosystem and propose ways to prevent cross-agent inefficiencies βread the paper π Paper2Agent: Reimagining Research Papers As Interactive and Reliable AI Agents (Stanford) β convert research papers into interactive MCP-based agents that can execute the original workflows and extend them βread the paper π Virtual Agent Economies (Google DeepMind) β conceptualize agent-to-agent markets and explore auction mechanisms, mission economies, and governance for steerable AI economies βread the paper WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents β generate challenging web navigation data and train agents with long contexts and tool calls for state-of-the-art browsing βread the paper EnvX: Agentize Everything with Agentic AI β transform GitHub repositories into autonomous agents capable of natural interaction and cross-repository collaboration βread the paper
|
Agentic RL & Long-Horizon Execution |
πBootstrapping Task Spaces for Self-Improvement (Meta) β train models with exploratory iteration to grow task spaces and enable inference-time self-improvement across math, tool-use, and ML tasks βread the paper Parallel-R1: Towards Parallel Thinking via Reinforcement Learning β instill parallel reasoning through curriculum and RL, using multi-path exploration as a scaffold for stronger problem solving βread the paper AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning β provide a unified framework and scaling strategy to train LLM agents for multi-turn decision making across realistic environments βread the paper Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents β stabilize learning with uncertainty-aware gradient modulation, amplifying confident correct updates while dampening unstable ones βread the paper Staying in the Sweet Spot: Responsive Reasoning Evolution via Capability-Adaptive Hint Scaffolding β dynamically adjust problem difficulty with adaptive hints to keep training efficient and aligned with model capacity βread the paper SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents β reinforce reasoning-oriented agents with synthetic data to strengthen autonomous deep research skills βread the paper ΞL Normalization: Rethink Loss Aggregation in RLVR β minimize gradient variance in verifiable reward training by normalizing losses for variable-length outputs βread the paper Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing β decentralize RL post-training with asynchronous rollout sharing to scale efficiently across heterogeneous hardware βread the paper
|
Reasoning, Hallucination & Reliability |
π The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs β show how compounding per-step accuracy yields exponential gains in long tasks, and why execution errors dominate over reasoning gaps βread the paper π Why Language Models Hallucinate (OpenAI) β explain hallucinations as statistical pressures from training and evaluation incentives that reward guessing, not calibrated uncertainty βread the paper The Majority is not always right: RL training for solution aggregation β train aggregators that reconcile multiple candidate solutions into a correct answer, outperforming majority voting βread the paper Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet β reveal that longer reasoning often increases hallucinations in fact-heavy settings, limiting test-time scaling benefits βread the paper
|
Safety, Security & Robustness |
π Reasoning Introduces New Poisoning Attacks Yet Makes Them More Complicated (Google DeepMind) β demonstrate decomposed reasoning poison attacks that target chain-of-thought while also revealing emergent robustness βread the paper π All You Need Is A Fuzzing Brain: An LLM-Powered System for Automated Vulnerability Detection and Patching (Texas A&M University) β build an LLM-driven system that discovers and patches software vulnerabilities, validated in DARPAβs AIxCC βread the paper π R2AI: Towards Resistant and Resilient AI in an Evolving World (Tsinghua) β propose a safe-by-coevolution paradigm where AI develops immunity-like resistance and resilience through adversarial feedback loops βread the paper π Statistical Methods in Generative AI β survey how statistical tools can improve reliability, fairness, and safety in generative AI pipelines βread the paper
|
Architectures & Training Paradigms |
Guided Decoding and Its Critical Role in Retrieval-Augmented Generation β compare decoding frameworks that constrain RAG outputs to structured formats, balancing hallucination control and usability βread the paper MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining β equip models with strong in-context ML capabilities via causal modelβbased pretraining and efficient serialization βread the paper Revolutionizing Reinforcement Learning Framework for Diffusion Large Language Models β introduce trajectory-aware RL for diffusion LMs, yielding smaller yet stronger reasoning models βread the paper π Language Self-Play For Data-Free Training (Meta) β use game-theoretic self-play to let models improve without external data, showing stronger task performance than data-driven baselines βread the paper π Causal Attention with Lookahead Keys β extend causal attention with lookahead keys to blend forward-looking context without breaking autoregressive constraints βread the paper
|
Multimodal Reasoning & Integration |
π Visual Representation Alignment for Multimodal Large Language Models (KAIST) β align multimodal LLMsβ vision pathways with pretrained VFMs to improve fine-grained visual reasoning βread the paper Can Understanding and Generation Truly Benefit Together β or Just Coexist? β unify image understanding and generation through reconstruction-based RL, showing mutual improvements βread the paper Visual Programmability: A Guide for Code-as-Thought in Chart Understanding β enable adaptive chart reasoning via code-as-thought pathways and RL-based strategy selection βread the paper
|
Thatβs all for today. Thank you for reading! Please send this newsletter to your colleagues if it can help them enhance their understanding of AI and stay ahead of the curve. |
How did you like it? |
|
|