I was f’tuning VGG network for style transfer (Gatys et al, 2015) and kept getting results that made no sense, so after some debugging it turned out I uploaded the weights incorrectly. Can someone point out my error, because I tried it on other models, and it seemed to work?
import vgg16
pretrained_weights = torch.load(os.path.join('pretrained_weights', 'vgg16-00b39a1b.pth'))
feature_extractor = vgg16.vgg16(pretrained=False)
for _n,par in feature_extractor.named_parameters():
par.requires_grad = False
par = copy.deepcopy(pretrained_weights[_n])
par = par.to(device)
vgg16 is the same vgg16.py from torchvision.models.
I believe copy.deepcopy didn’t change the parameters for some reason
This does not work because par = foo is assigning to the object foo the new name of par. What was in par before is deleted.
If you want to write in the par Tensor, you need to use an inplace operation like par.copy_(foo) to copy into the Tensor that is in par.
Also the .to() operation is always out of place, so you cannot do it like that. You will have to use feature_extractor.to(device) to move your weifghts to the right device.
t = some_cpu_tensor
t.cuda()
# here t is not on the the gpu
t = some_cpu_tensor
tc = t.cuda()
# here tc is on the the gpu
t = some_cpu_tensor
t = t.cuda()
# here t is on the the gpu
The thing is that the Tensor itself and the python name associated to it are two different things.
if you have par being some tensor. When you do par.copy_(), you write inplace into the tensor named par. If you do par = par + 1 then you create a new tensor that will contain the value of par plus 1, then associate that new tensor the name par. See that the Tensor that corresponds to par at the end is not the same as the original one.
In your case, if you do par = par.cuda(), after that line, par will point to a cuda tensor. But the original Tensor that is contained in the network won’t be changed.
isn’t this what I did to put par on CUDA with par=par.to(device). I don’t see the difference between putting the whole model model=model.to(device) and par tensor. Are you saying that a single tensor can only be put on cuda with t=t.cuda() command?
The difference is that model.to(device) is inplace while tensor.to(device) is not:
Does the sample below helps?
import torch
import copy
# I will use `.double()` instead of `.cuda()` because my local
# machine does not have cuda, but they behave the same way
tensor = torch.zeros(10)
model = torch.nn.Linear(10, 10)
new_val = torch.ones(10)
def your_copy(par):
par = copy.deepcopy(new_val)
par = par.double()
print("Your changes are out of place and so not reflected outside")
print("Before your copy on tensor")
print(tensor)
print(tensor.type())
your_copy(tensor)
print("After your copy on tensor")
print(tensor)
print(tensor.type())
print("Models are changed inplace:")
print("Before on module")
print(model.weight.type())
model.double()
print("After on module")
print(model.weight.type())
def my_suggestion(par):
par.copy_(new_val)
# No way to change type here inplace
print("With inplace change, you can change value and have it reflected outside")
print("Before my copy")
print(tensor)
my_suggestion(tensor)
print("After my copy")
print(tensor)