Is torch.save() limited to overwriting?

Ammar_ul_hassan · October 7, 2021, 9:21am

In my particular scenario, I want torch.save() to append the model state rather than overwrite it whenever it is called.
Assume I have a pre-trained cGAN model on 0 to 5 mnist dataset classes with classes as conditions (saved using torch.save() with the filename “trained.pt”).
Now I want to fine-tune this pre-trained model so that it can learn the remaining 6 to 9 classes. I saved the model in the same “trained.pt” file but noticed that it was overwriting the model state, i.e. it can generate images with 6-9 digits but ignores the 0-5 classes which were learned by the pre-trained model.
I tried to search but couldn’t come up with a solution so far. Is there a way so that I can append my finetuned model in the pre-trained model.

ptrblck · October 8, 2021, 7:21am

I don’t completely understand what

means.
Assuming you are storing the state_dict of the model, it would contain all trained parameters and buffers on the current use case. If you then fine-tune the model on another use case (e.g. numbers between 6 to 9) your model might “forget” the previously learned use case and the parameters might be updated to learn the currently trained use case. You wouldn’t be able to “append” this new model to the old one as it’s basically still the same model with another parameter set, so could you explain your use case a bit more, please?

Ammar_ul_hassan · October 8, 2021, 7:46am

Thanks for responding.
So basically my use case is to generate 0-9 mnist numbers using cGAN. However, I want to learn these 0-9 numbers (classes and corresponding images) in two steps.
Step1: The model learns 0 to 5 classes first.
Step2: The model learns the remaining classes i.e. 6 to 9 however WITHOUT forgetting step 1 learned classes i.e. after step 2 my model can generate images of all classes 0-9.

I have recently moved from tensorflow to pytorch. In TF I can easily do this without any extra code.
From the below thread I can see that in Pytorch I cannot do this because torch.save() overwrites and doesn’t provide any append functionality.

github.com/pytorch/pytorch

torch.save overwrite

opened 11:14PM - 04 Feb 19 UTC

shayneobrien

high priority triaged enhancement

## 🚀 Feature Add an overwrite feature to torch.save (default: True) or throw …a warning that a file will not overwrite an already existing file. ## Motivation The current [workaround](https://discuss.pytorch.org/t/how-to-save-a-model-from-a-previous-epoch/20252/4) is to save a new model at each epoch. This is helpful in many applications, but saving a single model file when you know overfitting will not be an issue (e.g. when reimplementing a paper) would save a lot of memory when training large models as well as remove the need for a workaround using os / shutil functions. The default in all other applications I have seen is to overwrite the file. cc @ezyang @gchanan @zou3519

ptrblck · October 8, 2021, 7:54am

Could you describe what TensorFlow is “appending”?
Storing two sets of parameters wouldn’t make sense unless you want to use two different models, so I’m unsure how this can work.

Ammar_ul_hassan · October 8, 2021, 7:58am

TF doesn’t append specifically I mean its the default behavior of TF. I mean in TF I can do these two steps and after step 2 my model remembers 0-9 classes however in Pytorch it forgets the Step1 classes.

ptrblck · October 8, 2021, 8:09am

That would depend on the training and is should be unrelated to torch.save.
I would recommend to check the training itself and see why your model forgets the previous cases.
Generally I would expect a pre-trained model (e.g. a ResNet trained on ImageNet) to forget the 1000 classes after I sufficiently trained it on my new use case (e.g. a 5 class guitar classifier).

Based on your use case I assume your model has 10 output neurons, while you are ignoring a subset in both steps?

Ammar_ul_hassan · October 8, 2021, 8:17am

Exactly, my model has 10 output neurons, and I am ignoring half of them in these two steps. A little explanation is below.

Assume that prior to training the model, there will be a total of ten classes (the size of the target one-hot vector is 10). However, in step 1, I show only the first five classes (the one-hot vector’s last 5 values remain 0, and the first five change depending on the input class) and train and save the model.

In step 2, I reload the model (which has already learned the first five classes) and learn the remaining classes without learning the previously learned classes (the one-hot vector first 5 values always remain 0, and the last five change depending on the input class), retraining and saving it.

Then, I expect the model to know all ten classes at this point, but in my experiment, it forgets the step 1 learned classes.

ptrblck · October 9, 2021, 12:24am

I think the expectation wouldn’t hold, since the model could learn to extract features only for the current task.
While the weights for the 5 previously classes in the classification layer wouldn’t be updated, the feature extractor would.
I would assume the model to forget the previously learned classes especially if the features between the tasks differ a lot.
E.g. an artificial examples would be to try to learn to classify 5 bird classes first and 5 cancer classes based on CT scans later. The model could try to extract a completely different feature set for these images and I would expect it to decrease its performance on the bird dataset.
Of course, your use case is different, i.e. your samples come from the same dataset, but I would still think the model could focus on different feature sets.
Do you have any resources which have evaluated the ability to “remember a task” of DL models?

Bridget_Murphy · January 23, 2024, 11:12am

Thank you for sharing good luck on your project.