Model weights not being updated

Hi! I’ve had the same problem. Loss was just the same.
This is my model:

class Model(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers):
        super().__init__()

        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.lstm = torch.nn.LSTM(input_size=input_size,
                                  hidden_size=hidden_size,
                                  num_layers=n_layers,
                                  dropout=0.3,
                                  batch_first=True)

        self.out = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()

    def forward(self, input_arr, hidden):

        output, (hidden_state, cell_state) = self.lstm(input_arr, hidden)
        output = self.out(output)

        # output = self.relu(output)  <-- Here was the problem

        return output, (hidden_state, cell_state)

I think, that model with random weights predicts a lot of negative values. But ReLU cut them out. So, MSELoss can’t do a lot with them. Weights aren’t updated.

But when I removed ReLU activation, it was just fine.

P.S. Thanks everybody who contributed in this discussion. It helped me a lot!

1 Like

Very thanks to you Sir.
The hint was really promising.

I am facing the similar problem loss of my model is not getting updated
my model defination:
class DocLSTM(nn.Module):
def init(self, vocab_size, in_dim, mem_dim, sparsity, freeze):
super(DocLSTM, self).init()
self.emb = nn.Embedding(vocab_size, in_dim, padding_idx=Constants.PAD, sparse=sparsity)
if freeze:
self.emb.weight.requires_grad = False
self.body_LSTM = nn.LSTM(150, 150, 1)
self.Para_LSTM = nn.LSTM(150, 150, 1)
self.Headline_LSTM = nn.LSTM(300, 150, 1)
self.childsumtreelstm = ChildSumTreeLSTM(in_dim, mem_dim)
torch.manual_seed(0)
self.sent_pad = torch.randn(1, 150)

    self.para_pad = torch.randn(1, 1, 150)
    self.word_pad= torch.randn(1,300)

when i run following code after loss.backward
list(self.model.parameters())[0].grad
list(self.doclstm.parameters())[0].grad
list(self.sent.parameters())[0].grad
for first two model .grad return none but for last it return the tensor of parameters. i am not able to figure out what the actual problem

i am using view to change dimension here
lstate, lhidden = self.childsumtreelstm(ltree, linputs,h_seq_hed.view(1,150),1)
is that the reason parameters are not being updated?

please help thank you in advance

Isn’t it the same thing only?

1 Like

you meant requires_grad=False, right?

I have the same problem my weight could not update , and all weight layer have requires_grad=True
can you help me:

weight layer: torch.Size([32]) Parameter containing:
tensor([1.0001, 1.0001, 0.9999, 1.0001, 1.0001, 0.9999, 0.9999, 0.9999, 1.0001,
1.0001, 1.0001, 0.9999, 0.9999, 0.9999, 1.0001, 0.9999, 0.9999, 0.9999,
1.0001, 1.0001, 0.9999, 1.0001, 1.0001, 1.0001, 0.9999, 0.9999, 1.0001,
0.9999, 0.9999, 0.9999, 1.0001, 1.0001], requires_grad=True)
weight layer torch.Size([32]) Parameter containing:
tensor([ 6.5435e-05, 6.5572e-05, -6.5296e-05, 6.6796e-05, 6.5458e-05,
-6.5342e-05, -6.5205e-05, -6.5350e-05, 6.5402e-05, 6.5721e-05,
6.5457e-05, -6.5321e-05, -6.5283e-05, -6.5280e-05, 6.5410e-05,
-6.5351e-05, -6.5342e-05, -6.5063e-05, 6.5461e-05, 6.5426e-05,
-6.5271e-05, 6.5417e-05, 6.5417e-05, 6.5462e-05, -6.5357e-05,
-6.5296e-05, 6.5392e-05, -6.5317e-05, -6.5294e-05, -6.5305e-05,
6.5454e-05, 6.5511e-05], requires_grad=True)

Could you explain your use case a bit more and post a (minimal) executable code snippet to reproduce this issue, please?

1 Like

i run REDWGAN model on cpu ,
the layer weight updates after 40 iteration was slow with adam optimizer , the weights of generator was updated slightly , it was initially :
this is the one of the layer weights

[[ 7.1380e-02, 1.3541e-02, -7.7020e-02],
[-3.7415e-02, -3.7045e-02, 3.1205e-02],
[ 8.8570e-03, -8.4177e-02, 5.3708e-03]]]]], requires_grad=True)
and after 40 iteration ,it was updated as

      [[ 7.1350e-02,  1.3510e-02, -7.7051e-02],
       [-3.7445e-02, -3.7075e-02,  3.1175e-02],
       [ 8.8266e-03, -8.4208e-02,  5.3404e-03]]]]], requires_grad=True)

and the two images,the input and denoised from REDWGAN are the same

image

Based on your description, the weights are indeed being updated, so it doesn’t seem to be a issue of static (detached) weights.
If you are concerned about the gradient magnitude (i.e. the weight updates themselves), you could try to play around with some hyperparameters, such as increasing the learning rate etc.

2 Likes

I’ve already changed the learning rate.

in my case this didn’t help.