Over Writing weights of a pre-trained network like alexnet

Mujtaba · January 5, 2018, 3:34pm

Hi Guys,
We can extract weights of a pre-trained network in the following ways:

model = models.alexnet(pretrained=True)
param = model.parameters()

weights_conv1 = next(param)
bias_conv1 = next(param)

OR

weights_conv1 = list(list(model.features.children())[0].parameters())[0]
bias_conv1 = list(list(model.features.children())[0].parameters())[1]

OR

value = model.state_dict()
weights_conv1 = value['features.0.weights']
bias_conv1 = value['features.0.bias']

OR

weights_conv1 = model.state_dict()['features.0.weights']
bias_conv1 =  model.state_dict()['features.0.bias']

My question here is that can we overwrite these weights/biases like:

model.state_dict()['features.0.bias'] = Variable(torch.randn(64))

OR

list(list(model.features.children())[0].parameters())[1] = Variable(torch.randn(64))

???
I am trying this but its not working. is there any other method for this???
Actually i want to provide the weights by myself and don’t allow model to learn them by back propagation.

smth · January 11, 2018, 2:00pm

What you need to do is:

state_dict = model.state_dict()
fbias = state_dict["features.0.bias"]
state_dict["features.0.bias"] = Variable(fbias.data.new(64).normal_()) # make a random tensor of same type and device as original
model.load_state_dict(state_dict)

tor · July 6, 2018, 8:33am

Which one is the best practice between above and this?
So far, I observe both are the same, but I am afraid there is a catch.

state_dict = model.state_dict()
fbias = state_dict["features.0.bias"]

# approach 1: from above
state_dict["features.0.bias"] = Variable(fbias.data.new(64).normal_()) 
model.load_state_dict(state_dict)

# approach 2: from https://stackoverflow.com/questions/49446785/how-can-i-update-the-parameters-of-a-neural-network-in-pytorch
state_dict["features.0.bias"].copy_( Variable(fbias.data.new(64).normal_()) )

Thank you.

smth · July 6, 2018, 12:36pm

The copy is probably slightly nicer. If state_dict["features.0.bias"] is sharing storage with something else, then the copy makes sure that the sharing is preserved. (for example, if features.0.bias is a view of another Tensor).