Weights loaded incorrectly

I was f’tuning VGG network for style transfer (Gatys et al, 2015) and kept getting results that made no sense, so after some debugging it turned out I uploaded the weights incorrectly. Can someone point out my error, because I tried it on other models, and it seemed to work?

import vgg16
pretrained_weights = torch.load(os.path.join('pretrained_weights', 'vgg16-00b39a1b.pth'))
feature_extractor = vgg16.vgg16(pretrained=False)

for _n,par in feature_extractor.named_parameters():  
    par.requires_grad = False
    par = copy.deepcopy(pretrained_weights[_n]) 
    par = par.to(device)

vgg16 is the same vgg16.py from torchvision.models.

I believe copy.deepcopy didn’t change the parameters for some reason

Did you try the recommended way of loading weights in the tutorials ? Does it not work in your case ?

I did later, and it fixed the issue. My question is, why this doesn’t work.


This does not work because par = foo is assigning to the object foo the new name of par. What was in par before is deleted.
If you want to write in the par Tensor, you need to use an inplace operation like par.copy_(foo) to copy into the Tensor that is in par.

Also the .to() operation is always out of place, so you cannot do it like that. You will have to use feature_extractor.to(device) to move your weifghts to the right device.

Thanks, I’m sorry why is that operation ‘out of place’? This won’t put the tensor on CUDA?

The code below should be clear.

t = some_cpu_tensor
# here t is not on the the gpu

t = some_cpu_tensor
tc = t.cuda()
# here tc is on the the gpu

t = some_cpu_tensor
t = t.cuda()
# here t is on the the gpu
1 Like

@sigma_x Just curious, why aren’t you using
feature_extractor = vgg16.vgg16(pretrained=True)?

but in my code


are you saying that par tensor will not be placed on device?

The thing is that the Tensor itself and the python name associated to it are two different things.
if you have par being some tensor. When you do par.copy_(), you write inplace into the tensor named par. If you do par = par + 1 then you create a new tensor that will contain the value of par plus 1, then associate that new tensor the name par. See that the Tensor that corresponds to par at the end is not the same as the original one.

In your case, if you do par = par.cuda(), after that line, par will point to a cuda tensor. But the original Tensor that is contained in the network won’t be changed.

In your code snippet

t = some_cpu_tensor
t = t.cuda()

isn’t this what I did to put par on CUDA with par=par.to(device). I don’t see the difference between putting the whole model model=model.to(device) and par tensor. Are you saying that a single tensor can only be put on cuda with t=t.cuda() command?


The difference is that model.to(device) is inplace while tensor.to(device) is not:

Does the sample below helps?

import torch
import copy

# I will use `.double()` instead of `.cuda()` because my local
# machine does not have cuda, but they behave the same way

tensor = torch.zeros(10)
model = torch.nn.Linear(10, 10)

new_val = torch.ones(10)
def your_copy(par):
  par = copy.deepcopy(new_val)
  par = par.double()

print("Your changes are out of place and so not reflected outside")
print("Before your copy on tensor")


print("After your copy on tensor")

print("Models are changed inplace:")
print("Before on module")


print("After on module")

def my_suggestion(par):
  # No way to change type here inplace

print("With inplace change, you can change value and have it reflected outside")
print("Before my copy")


print("After my copy")

@malicorahul: some layers can be modified so the number of weights does not match. It’s better then to load weights manually.