I was wondering if anyone has implemented Elastic Weight Consolidation (EWC) as outlined in this paper? This algorithm allows for sequential/continuous learning without the model encountering catastrophic forgetting.
The main part of implementing this is calculating the Fisher information matrix. If anyone has any code they can share on this, that’d be great. Otherwise I’m happy to attempt it and share my code here.
The original EWC requires you to compute the importance for each weight based on an additional pass over the training set. The importance is the squared gradient averaged over each minibatch. Anyway, you can take a look at the implementation available in the ContinualAI notebooks. It is an association for Continual Learning