In certain situations, I would like to use custom weights in certain layers (e.g., from previously trained models), however, I noticed that if I assign weight and bias values manually, they don’t seem to update during training. Below is a simplified example using “normal_” and “zero_”. Here, if I comment out the lines
the model learns. This suggests there’s something from about my approach. Since
self.linear_*.weight is already a parameter instance, I thought overwriting *.weight and *.bias would be enough, but it doesn’t seem that way.
Would be nice if someone could shed some light onto this issue!
def __init__(self, num_features, num_classes):
self.linear_1 = torch.nn.Linear(num_features, num_hidden_1)
self.linear_2 = torch.nn.Linear(num_hidden_1, num_hidden_2)
self.linear_out = torch.nn.Linear(num_hidden_2, num_classes)
def forward(self, x):
out = self.linear_1(x)
out = F.relu(out)
out = self.linear_2(out)
out = F.relu(out)
out = self.linear_out(out)
out = F.softmax(out, dim=1)
model = Model(num_features=num_features,
Could you check if the gradients for those parameters are zero? What you’re doing should be sufficient to add custom weights.
All linear layers seem to have gradients.
If I may speculate a bit… What kind of loss are you using?
Supposing CrossEntropyLoss, you should remove the
softmax layer in your model.
This criterion combines LogSoftMax and NLLLoss in one single class.
The input is expected to contain scores for each class.
Yeah, the gradients seem to be zeroed successfully in each iteration. I checked this as follows
for epoch in range(num_epochs):
for batch_idx, (features, targets) in enumerate(train_loader):
features = Variable(features.view(-1, 28*28))
targets = Variable(targets)
features, targets = features.cuda(), targets.cuda()
outputs = model(features)
cost = cost_fn(outputs, targets)
good point. However, when I remove the custom weight init, the model learns (weights and biases are being updated instead of frozen), so I would assume it should independent of the loss?
Side question: Which loss should then be used with softmax in the last layer?
As far as I know,
NLLLoss is the way to go. Alternatively logits +
You are right, that doesn’t answer the issue. Let me check you model again.
That’s strange. Using dummy data, the weights get updated in both cases (initializing the parameters and without).
What are the gradients after the
hm … I cannot reproduce this issue anymore. I think it was related to one or more bugs/misconceptions. E.g., I didn’t know that the ToTensor() transform in pytorchvision already scales pixels values from [0, 255] to [0, 1] so that I normalized input images twice, which had super small values after that (i.e., pixel/255/255), which probably caused the network not to learn with the suboptimal weight initialization and passing probabilities to the CrossEntropyLoss. Anyway, thanks for looking into that!