I’m training a model and want to load the last three saved checkpoints and do an average/mean of the weights and save it into one new average model/weight, all are from the same architecture and trained data, idea?
I can easily do it in Keras/TF, more experience in that area but need someone here who is more experienced in Pytorch.
A simple way to go about this would be to load each checkpoint in succession, add the parameter values of each into appropriately sized tensors, and then divide by 3 to get the mean.
A simple 1 layer example would be:
layer_1 = 0
for param in model.named_parameters():
if param[0] == 'fc.weight':
layer_1 = torch.zeros_like(param[1].data)
Then for each checkpoint do:
for param in model.named_parameters():
if param[0] == 'fc.weight':
layer_1 += param[1].data
Now divide layer_1 by three, create a new model instance, and run:
for param in model.named_parameters():
if param[0] == 'fc.weight':
param[1].data = layer_1
There is probably a much more elegant way to do this, but this is what comes to mind for me.
I found an already implemented technique for it – ‘Stochastic Weight Averaging’ and it works!