It was a long year, packed with breakthroughs that pushed AI toward better reasoning and efficiency. Throughout this half of the year, we kept an eye on what matters most in our AI 101 series. Now we have a more complete picture of reinforcement learning (RL), model optimization techniques, continual learning, neuro-symbolic AI, multimodality, robotics, the hardware powering AI today, and other approaches that define the space. |
While concepts show the general focus of AI researchers, methods illustrate a more pointed approach to solving AI issues and improving what already works. So letβs see what we have at the end of this year β a solid foundation to build on in 2026. |
|
AI Concepts That Defined 2025 |
But first, the gifts π Once a year, in this magical season of giving and receiving, we offer the only chance all year to get our Premium subscription with 20% OFF. Get it before Jan 1 β | | P.S. Weβre working on offering more. Plan prices will increase in early 2026. |
|
|
These are recaps of the key models and concepts from the first half of 2025. |
|
1. Reinforcement Learning: The Ultimate Guide to Past, Present, and Future |
It is one of our most readable articles. Why? Reinforcement Learning (RL) β the idea of agents learning through trial and error β was shaping real-world systems long before it became the backbone of modern AI, unlocking new possibilities for modelsβ reasoning and agentic behavior and pushing them toward the category of reasoning models or AI agents. It is everywhere in the conversation right now, so we put together a clear guide to what RL is, where it all started, and where itβs headed. |
|
2. The State of RL in 2025 |
Hereβs a closer look at how RL has evolved throughout 2025 since the release of DeepSeek-R1, and at the key trends driving RL today, including Reinforcement Learning with Verifiable Rewards (RLVR), main policy optimization techniques, RL from human and AI feedback, RL scaling and more. We also discuss why some people say that RL is not as good as you might think. |
|
3. RLHF variations: DPO, RRHF, RLAIF |
Reinforcement learning from human feedback (RLHF) became the default alignment strategy for LLMs in 2025. It nudges AI toward being more helpful and more consistent. |
However, we canβt rely on just one method for trust calibration as a one-size-fits-all solution. If you want to know more about the strong alternatives to RLHF, such as Direct Preference Optimization (DPO), Reward-Rank Hindsight Fine-Tuning (RRHF), and RL from AI Feedback (RLAIF), this episode is for you. |
|
| Join Premium members from top companies for only $56/year. Offer expires on Jan 1! Donβt miss out. Learn the basics and go deeperππΌ |
|
|
4. What is Continual Learning? |
The next stage after efficient training of models is to teach them to keep learning new things over time without forgetting what they already know. Neural networks are not very flexible when it comes to changes in data. Here, we explore how to achieve stable continual learning and explain the approaches of two new, interesting methods: |
Nested Learning idea with a new HOPE architecture for continual learning from Google, Sparse Memory Finetuning (using memory layers) by FAIR at Meta.
|
AI 101: What is Continual Learning? | Can models add new knowledge without wiping out what they already know? We look at why continual learning is becoming important right now and explore the new methods emerging for it, including Googleβs Nested Learning and Metaβs Sparse Memory Fine-tuning | www.turingpost.com/p/continuallearning |
| |  |
|
|
5. What's New in Test-Time Scaling? |
Last December, we made a bold bet that 2025 would be the "Year of Inference-Time Search." Looking back, that prediction defined the entire year. Many systems shifted the focus from the training stage to inference, allowing models to think slow and thoroughly. Test-time compute is the key to influencing a model's behavior during inference, and scaling it can unlock even more from models. Here we dive deep into three outstanding approaches to test-time compute scaling: |
Chain-of-Layers (CoLa), allowing for better control and optimization of reasoning models. MindJourney test-time scaling framework blending Vision-Language Models (VLMs) and world models. Google Cloudβs TTD-DR applying the diffusion process for test-time scaling to build a better deep research agent.
|
AI 101: What's New in Test-Time Scaling? | The test-time scaling journey into world models, agents, diffusion with a reminder to stay cautious and keep it controllable | www.turingpost.com/p/testtimescaling2 |
| |  |
|
|
6. What is Neuro-Symbolic AI? |
Neuro-symbolic AI (or neural-symbolic AI) is a concept that appeared long ago, evolved in waves, and is now seen by many as a strong path toward next-level AI. It combines neural networks, which learn patterns from data, with symbolic systems, that handle structured knowledge and logic, to build models that can both predict like neural nets and reason and understand like humans. So it definitely shouldn't be overlooked. |
AI 101: What is Neuro-Symbolic AI? | Everything you need to know about hybrid neuro-symbolic AI, how it blends strict logic and rules with neural networks, and why it shouldn't be overlooked | www.turingpost.com/p/neurosymbolic |
| |  |
|
|
|
7. The Future of Compute: Intelligence Processing Unit (IPU) and other alternatives to GPU/TPU/CPU |
Another one of our most-read and must-read articles. Itβs a deep dive into what really powers AI today. We look beyond GPUsβ monopoly to explore CPUs, TPUs, ASICs, APUs, NPUs, and other hardware β what they are and where they are used. Read to learn everything you need to know about them. |
|
8. Inside Robotics |
How to building the body of AI? Here is a comprehensive guide to the basics of robotics β how it is trained and powered β from Figure 03, Neo, Unitree robots to NVIDIA freshest updates. Very fun episode. |
AI 101: Inside Robotics | Building the body of AI: How Physical AI is trained and powered β from Figure 03, Neo, Unitree robots to NVIDIA freshest updates | www.turingpost.com/p/insiderobotics |
| |  |
|
|
|
|
Outstanding AI Methods and Techniques in 2025 |
1. What matters for RL? Precision! Switching BF16 β FP16 |
Many paid attention to this method. To get better stability and accuracy in RL, you should change the numerical precision from the newer BF16 format to the older FP16. And here is why β |
|
2. What are Modular Manifolds? |
Thinking Machines Lab contributed a lot of great research to the community, and one of them is a new glance at neural network optimization with modular manifolds. They step into the area of geometry-aware optimization for better stability and consistency. We explain key concepts (weights, gradients, norms, modular norms, manifolds, modular manifolds, and modular duality) to understand how modular manifolds can breathe new life into current optimizers. |
|
3. What is XQuant? |
Another important part of model optimization is optimizing memory use β itβs a much bigger issue than optimizing math. XQuant and its XQuant-CL variation are very interesting methods that can reduce memory use by up to 12 times, adding just a little extra compute and bypassing typical techniques like the KV cache. |
AI 101: What is XQuant? | Compute is not a big deal for LLMs now, but memory is. Explore how a new XQuant method and its XQuant-CL variation can save the memory use up to 12 times | www.turingpost.com/p/xquant |
| |  |
|
|
4. Fusing Modalities: Basics + the New MoS Approach |
AI models now work across images, text, audio, video, and more, but combining even two modalities is still a challenge. As AI systems race to become true all-rounders, multimodal fusion has become central to building models that understand how different data types work together. In this article, we explore how multimodal data is mixed, the challenges involved, and common fusion strategies, with a closer look at Meta AI and KAUSTβs Mixture of States (MoS) approach, which mixes data at the state vector level within each layer using a learnable router. |
|
5. What is Mixture-of-Recursions (MoR)? |
In 2025, developers also keep working on Transformers. Mixture-of-Recursions (MoR) is a Transformer variant that allows layers to be reused dynamically, so each token receives only the amount of computation it needs. Instead of always running through a fixed number of layers, MoR controls how deeply each token βthinksβ using routing and KV caching. |
|
6. Rethinking Causal Attention |
This is also the year of reconsidering attention mechanisms. Models need access to rich global context, and the CASTLE (Causal Attention with Lookahead Keys) approach is a modification of causal attention that allows it. Models can incorporate limited information from future tokens while still generating text autoregressively. Really worth exploring β |
AI 101: Rethinking Causal Attention | How Causal Attention with Lookahead Keys (CASTLE) and future-aware casual masks reshape the strict left-and-right order of autoregressive models, plus attention in cause-effect relationships. | www.turingpost.com/p/rethinkingcasualattention |
| |  |
|
|
|
If youβre not done yet and want to refresh what was relevant in the first part of 2025 and what has changed, check out our two summer recaps as well: |
|
|
Next week, weβll recap the models we covered this year. |
|
How did you like it? |
|