We are doing continual learning, where after a while a new class is encountered and the amount of output units in the network’s last nn.Linear() classification layer are grown to reflect the new total amount of classes. The newly added slice is initialized, while the older class’ units are left as they are. To make sure that this is caught for optimization we also do the same with the gradient tensors and the respective tensors for a potential bias. (The optimizer is then re-instantiated.)
These resizing (in-place) operations on e.g. weight.data.resize_(…) have worked fine in PyTorch 1.0 and we have checked numerically that the weight values are consistent and the updates change correctly.
In the newer PyTorch version 1.1 these resizing operations seem to no longer be allowed. I have seen RuntimeError: set_sizes_contiguous is not allowed on Tensor created from .data or .detach(), in Pytorch 1.1.0 where the suggestion is to not use .data and use copy operations with torch.no_grad(). I am unsure if this is the ideal implementation for our case. Our code can be found here: https://github.com/MrtnMndt/OCDVAE_ContinualLearning/blob/master/lib/Models/architectures.py in lines 8-52 and the corresponding issue is here: https://github.com/MrtnMndt/OCDVAE_ContinualLearning/issues/1
I went through the patch notes and couldn’t find any information with respect to resizing. I understand that operations on .data would be discouraged in general, but I would appreciate if someone could shed some more light into why it has been removed entirely as I believe there is cases (like dynamic architectures) where this functionality is pretty useful. We had been working on such dynamic architecture operations (in any layer) in earlier PyTorch versions like 0.3 and 0.4 and it always seemed to have been fine, like in this thread: Dynamically Expandable Networks - Resizing Weights during Training
Even more importantly, I would appreciate recommendations of how to change our code and adapt it to PyTorch 1.1 in the way it is meant to be (instead of coding some hacky solution).
I suppose one of the more hacky solutions would be to copy all the old weight values, remove the last linear, create and add a new last layer of the correct shape, copy the weights back into the correct slice and create a new optimizer. Is this the only way of doing layer reshaping operations now or is there some more straightforward way like what we had done before?
I will appreciate any answers, comments or pointers to patch notes etc…