If you think about the term AGI, especially in the context of pre-training, you will realize that the human being is not an AGI, because a human being lacks a huge amount of knowledge. Instead, we rely on continual learning. | | | | Ilya Sutskever |
|
| Do you feel this shift too? The idea of models learning endlessly is showing up everywhere. We see it, we hear it, and itβs all pushing the spotlight toward continual learning. | Continual learning is the ability to keep learning new things over time without forgetting what you already know. Humans do this naturally (as Ilya Sutskever also noted) and they are very flexible to changing data. But, unfortunately, neural networks are not. When developers change the training data, they often face something that is called catastrophic forgetting: the model starts loosing its previous knowledge, and returns to training model from scratch. | Finding the very balance between a modelβs plasticity and its stability in previously learned knowledge and skills is becoming a serious challenge right now. Continual learning is the path to more βintelligentβ systems that will save time, resources, and money spent on training, it helps mitigate biases and errors, and, in the end, things can just go easier and more naturally with model deployment. | Today weβll look at the basics of continual learning and two approaches that are worth your attention: very recent Googleβs Nested Learning and Meta FAIRβs Sparse Memory Finetuning. There is a lot to explore β | | In todayβs episode, we will cover: | Continual Learning: the essential basics Setups and scenarios for Continual Learning training How to help models learn continually? General methods What is Nested Learning? Cautious continual learning with memory layers Sparse Memory Finetuning Limitations
Conclusion / Why continual learning is important now? Sources and further reading
| Continual Learning: The essential basics | Continual learning means learning step-by-step from data that changes over time. So it is related to two main things: | Non-stationary data, which means the data distribution does not stay the same and keeps shifting. Incremental learning β the model should add new knowledge without wiping out what it learned before.
| The new pieces of information can be new skills, new examples, new environments, or new contexts. As the data comes in gradually, continual learning is also known as lifelong learning. The process of continual learning happens when the model is already deployed. | Everything would be great if models didnβt face one major challenge β catastrophic forgetting. This problem generally looks like this: a neural network is trained on Task 2 after Task 1, and its weights are updated for Task 2. This often pushes them away from the optimum for Task 1, and the model suddenly performs very poorly on that task. | The problem here is not the modelβs capacity β this usually happens because of the sequential training procedure. Even in 1989-1990, Michael McCloskey and Neal J. Cohen and R. Ratcliff identified this problem and showed that simple networks lose previous knowledge extremely quickly when trained sequentially. They also highlighted that this forgetting is much worse than in humans. | But if you train on Tasks 1 and 2 interleaved, forgetting does not happen. |  | Image Credit: Illustration of catastrophic forgetting, βContinual Learning and Catastrophic Forgettingβ paper |
| Preventing forgetting is only one part of the solution. Effective continual learning also requires: | Fast adaptation Ability to leverage task similarities Task-agnostic behavior Robustness to noise High efficiency in memory and compute Avoiding storing all past data and retraining on all previous data
| If tasks are related, the model should get better at one after learning another, which marks positive knowledge transfer: | | So, a good continual learning system needs the right balance: it should stay stable (not forget old things) while still being plastic enough to learn new ones. It also needs to handle differences within each task and across different tasks. How is it released on practice? |  | Image Credit: βA Comprehensive Survey of Continual Learning: Theory, Method and Applicationβ paper |
| Setups and scenarios for Continual Learning training | Continual learning is mainly about moving from one task to the next while keeping performance stable or improving it during ongoing learning. Thatβs why two fundamental setups are used for it: | Task-based continual learning: Data is organized into clear, separate tasks which are shown one after another, with explicit task boundaries. It is the most common setup, because it is convenient and controlled β you know exactly when tasks switch. But it doesnβt represent gradual changes found in the real world, and models may rely too heavily on boundaries for memory updates. Task-free continual learning: This one is more realistic, because it better reflects real-world data where distributions shift continuously. There is still an underlying set of tasks, but task boundaries are not given and transitions are smooth.
|  | Image Credit: βContinual Learning and Catastrophic Forgettingβ paper |
| Continual learning researchers often uses three main scenarios to describe what the model is expected to know at test time and whether it gets task identity information. Importantly, these scenarios are defined by how the changing data relates to the function the network must learn: | | Join Premium members from top companies like Microsoft, Nvidia, Google, Hugging Face, OpenAI, a16z, plus AI labs such as Ai2, MIT, Berkeley, .gov, and thousands of others to really understand whatβs going on with AI. Learn the basics and go deeperππΌ |
|
|
|