Reasoning Models are a huge part of AI community discussions and debates, gaining more and more attention every day since the appearance of OpenAIβs o1 model and then open-sourced DeepSeek-R1. It seems that this topic has already been picked apart from every angle, which marks that we are dealing with a true breakthrough. As Nathan Lambert claims in his blog βthis year every major AI laboratory has launched, or will launch, a reasoning modelβ because these models can solve βthe hardest problems at the frontier of AIβ far better than previous generation models. | While we already have so much information about Reasoning Models and every week a lot of papers introduce their approaches for improving βreasoningβ capabilities of models, surprisingly, we still donβt have a solid definition of what exactly a Reasoning Model is. The main debate is whether reasoning models even deserve to be a separate category. But letβs call things what they are. | Today, weβre going to take a deep dive into what exactly counts as a Reasoning Model, which models fall into that category, and where this field might be headed in the future to unlock the upcoming AI possibilities. So letβs get started. | | In todayβs episode, we will cover: | Controversial nature of Reasoning Models What makes Reasoning Models different? Models which fall into the βReasoningβ category Limitations: The problems of overthinking and more Future perspectives Conclusion Sources and further reading
| Controversial nature of Reasoning Models | The rise of Reasoning Models has sparkled lively debate in the AI field: Should these βthinkingβ models be considered a separate category of AI, distinct from plain LLMs? Or are they essentially the same core technology with some clever add-ons? Well, letβs look at both sides of this issue. | Proponents argue that models optimized specifically for reasoning, represent a qualitative leap from LLMs despite being built on them. Thanks to new concepts, such as special reinforcement learning (RL) phase used for training and increased inference-time compute, reasoning models unlock new capabilities, which are used as arguments for placing them in a separate category: | Performance leaps: Models like OpenAIβs o1, o3 outperform traditional LLMs on hard and long tasks, like solving 20-step math problems or generating complex code. Smaller Reasoning Models outperforming larger LLMs on reasoning benchmarks. Many major AI labs released reasoning-focused models in 2024β2025. Branding shift: Models are explicitly marketed as βreasonersβ, emphasizing that they are different from general chatbots. These models are designed not just for output generation, but for reasoning, planning, and tool use β the early foundations of agentic AI.
| Nathan Lambert, who emerged as a key explainer of Reasoning Models, suggests calling them βReasoning Language Models (RLMs)β and notes that their emergence has βmuddiedβ the old taxonomy of pre-training vs fine-tuning. He also emphasized how RLMs redefined the post-training landscape and how Reinforcement Learning with Verifiable Rewards (RLVR) has led to main advancements in model capabilities. (check the Sources section for all links) | While Andrej Karpathy hasnβt explicitly argued for or against a new category, his observations highlight that something qualitatively different is happening inside these RL-trained models: βIt's the solving strategies you see this model use in its chain of thought. It's how it goes back and forth thinking to itself. These thoughts are emergent (!!!) and this is actually seriously incredible,β β he wrote in his post on Twitter, on January 27, 2025. | Many academic studies, like βReasoning Language Models: A Blueprintβ echo the idea of dividing Reasoning Models from LLMs, describing them as a βtransformative breakthrough in AI, on par with the advent of ChatGPTβ moving AI closer to general problem-solving and possibly AGI. | On the other side, many remain cautious about overhyping Reasoning Models as something fundamentally new. Under the hood, they are still transformer-based language models using the same next-token prediction objective. Their improved performance comes not from architectural breakthroughs but from methods like supervised fine-tuning, reinforcement learning, chain-of-thought prompting, longer inference runs, and carefully curated training data. | So the main counterpoints include: | No architectural change: Reasoning ability is a product of optimized training regimes (e.g., SFT, RLHF, ReFT), inference-time scaffolding (like CoT and majority voting), and increased compute β not new model designs. They are still standard autoregressive LLMs. Limited generalization: High performance in math, code, or logic puzzles doesnβt generalize to open-ended, commonsense, or causal reasoning. Models often fail on tasks that require real-world understanding, novel abstraction, or reasoning under uncertainty. ARC, counterfactuals, and long-horizon tasks remain challenging. Branding vs. substance: Terms like βreasonerβ risk overstating capabilities and fueling AGI hype. Narrow domains: Some reasoning-tuned models underperform on general language tasks like storytelling, open-ended dialogue, or real-world question answering. Their specialization can lead to regressions in fluency, creativity, or common sense β casting doubt on how βgeneralβ their reasoning really is.
| Melanie Mitchell, Professor at the Santa Fe Institute, question whether reasoning models reflect real understanding or simply mimicked heuristics: βThe performance of these models on math, science, and coding benchmarks is undeniably impressive. However, the overall robustness of their performance remains largely untested, especially for reasoning tasks that, unlike those the models were tested on, donβt have clear answers or cleanly defined solution stepsβ (from the article βArtificial intelligence learns to reasonβ) | But does we really need a completely new architecture with the best capabilities in all domains to define model as a new type? Maybe we should admit that sometimes new concepts, new paradigms and new view on what we already have can lead to opening a new family of models that can be specified for certain tasks. | | From the Turing Post side, we do define Reasoning Language Models (RLMs) as one of the models types. Here is why it makes sense. | Note: In the post below, we called Reasoning Models LRM, or Large Reasoning Models, but taking into account the variation of size of Reasoning Models that already exist, itβs better to define the entire group as βReasoning Language Modelsβ, or RLMs. |  | TuringPost @TheTuringPost |  |
| |
12 Foundational AI Model Types βͺοΈ LLM βͺοΈ SLM βͺοΈ VLM βͺοΈ MLLM βͺοΈ LAM βͺοΈ LRM βͺοΈ MoE βͺοΈ SSM βͺοΈ RNN βͺοΈ CNN βͺοΈ SAM βͺοΈ LNN Save the list and check this out for explanations and links to the useful resources: huggingface.co/posts/Kseniaseβ¦ | |  | | 12:24 PM β’ Jun 8, 2025 | | | | 1.1K Likes 263 Retweets | 8 Replies |
|
| What makes Reasoning Models different? | If we gather the main features of RLMs in one definition we would say that they are advanced AI systems specifically optimized for multi-step logical reasoning, complex problem-solving, and structured thinking. RLMs incorporate test-time scaling, RL post training, Chain-of-Thought reasoning, tool use, external memory, strong math and code capabilities, and more modular design for reliable decision-making. | Letβs break down what stands behind each feature step by step: | | Join Premium members from top companies like Microsoft, Google, Hugging Face, a16z, Datadog plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand whatβs going on with AI. Simplify your learning journey ππΌ |
|
| Itβs an important topic β make a choice if this article should be open to public. If we get more than 100 votes, we will open this article to the broader discussion. Voteβ | Shall we make this post public to invite broader discussion? | |
|
|
|