Weight initialization shape question

111242 · May 27, 2021, 4:18am

I have a neural network with input shape = 1000, hidden = 500, and output = 2 neurons for classification. I am trying to do transfer learning, and my pretrained model is shaped shape = 1000, hidden = 500, and output = 200 neurons. However, when I try to initialize the classification model parameters with the learnt parameters, there is no error about different output layer neuron numbers.

No error above when trying to initialize by directly setting the data. However, when I initialize with load_state_dict(), the correct behavior occurs. Below:

ptrblck · May 27, 2021, 4:26am

You are manually overriding the .data attribute, which won’t trigger any shape checks, so it would be your responsibility to make sure the parameters are assigned properly using this approach.
With that being said, note that you should generally not use the .data attribute, as it could yield unwanted side effects. Instead, if you want to manually assign the parameters, wrap the assignment in a with torch.no_grad() block and use the .weight and .bias attributes.

111242 · May 27, 2021, 1:14pm

Thank you for your response. When I initialize the weights with the incorrect number of neurons, that does not change the current classification network’s shape. For example, the classification network is shaped 1000-500-2 and the weights that I am initializing the model with is 1000-500-200. When I use .data to initialize, the resulting shape remains 1000-500-2. According to your response, shouldn’t it change the shape to 1000-500-200? Or am I understanding it vice versa?

ptrblck · May 27, 2021, 8:01pm

The model will change, but since you’ve manipulated the internal parameters manually, the in_features and other attributes won’t be changed as seen here:

# default setup
model = nn.Linear(10, 10, bias=False)
print(model)
> Linear(in_features=10, out_features=10, bias=False)
x = torch.randn(1, 10)
out = model(x)
print(out.shape)
> torch.Size([1, 10])

# manual manipulation
with torch.no_grad():
    model.weight = nn.Parameter(torch.randn(1, 10))
print(model)
> Linear(in_features=10, out_features=10, bias=False) # wrong, as you've manually manipulated the parameter

out = model(x)
print(out.shape)
> torch.Size([1, 1]) # new, expected output shape

I would generally not recommend to manipulate internals manually unless you are sure that’s the right approach.

111242 · May 28, 2021, 4:16am

Thank you so much for your clarification!