Model weights not being updated

Vadbeb · April 5, 2020, 8:47pm

Hi! I’ve had the same problem. Loss was just the same.
This is my model:

class Model(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers):
        super().__init__()

        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.lstm = torch.nn.LSTM(input_size=input_size,
                                  hidden_size=hidden_size,
                                  num_layers=n_layers,
                                  dropout=0.3,
                                  batch_first=True)

        self.out = nn.Linear(hidden_size, output_size)
        self.relu = nn.ReLU()

    def forward(self, input_arr, hidden):

        output, (hidden_state, cell_state) = self.lstm(input_arr, hidden)
        output = self.out(output)

        # output = self.relu(output)  <-- Here was the problem

        return output, (hidden_state, cell_state)

I think, that model with random weights predicts a lot of negative values. But ReLU cut them out. So, MSELoss can’t do a lot with them. Weights aren’t updated.

But when I removed ReLU activation, it was just fine.

P.S. Thanks everybody who contributed in this discussion. It helped me a lot!

Mohammad_Parvini · February 22, 2021, 9:02am

Very thanks to you Sir.
The hint was really promising.

text_minor · April 2, 2021, 1:07am

I am facing the similar problem loss of my model is not getting updated
my model defination:
class DocLSTM(nn.Module):
def init(self, vocab_size, in_dim, mem_dim, sparsity, freeze):
super(DocLSTM, self).init()
self.emb = nn.Embedding(vocab_size, in_dim, padding_idx=Constants.PAD, sparse=sparsity)
if freeze:
self.emb.weight.requires_grad = False
self.body_LSTM = nn.LSTM(150, 150, 1)
self.Para_LSTM = nn.LSTM(150, 150, 1)
self.Headline_LSTM = nn.LSTM(300, 150, 1)
self.childsumtreelstm = ChildSumTreeLSTM(in_dim, mem_dim)
torch.manual_seed(0)
self.sent_pad = torch.randn(1, 150)

    self.para_pad = torch.randn(1, 1, 150)
    self.word_pad= torch.randn(1,300)

when i run following code after loss.backward
list(self.model.parameters())[0].grad
list(self.doclstm.parameters())[0].grad
list(self.sent.parameters())[0].grad
for first two model .grad return none but for last it return the tensor of parameters. i am not able to figure out what the actual problem

i am using view to change dimension here
lstate, lhidden = self.childsumtreelstm(ltree, linputs,h_seq_hed.view(1,150),1)
is that the reason parameters are not being updated?

please help thank you in advance

asura · June 12, 2021, 4:02pm

Isn’t it the same thing only?

seyeeet · June 18, 2021, 3:09am

you meant requires_grad=False, right?

anoud_abdalziz · June 28, 2021, 3:51pm

I have the same problem my weight could not update , and all weight layer have requires_grad=True
can you help me:

weight layer: torch.Size([32]) Parameter containing:
tensor([1.0001, 1.0001, 0.9999, 1.0001, 1.0001, 0.9999, 0.9999, 0.9999, 1.0001,
1.0001, 1.0001, 0.9999, 0.9999, 0.9999, 1.0001, 0.9999, 0.9999, 0.9999,
1.0001, 1.0001, 0.9999, 1.0001, 1.0001, 1.0001, 0.9999, 0.9999, 1.0001,
0.9999, 0.9999, 0.9999, 1.0001, 1.0001], requires_grad=True)
weight layer torch.Size([32]) Parameter containing:
tensor([ 6.5435e-05, 6.5572e-05, -6.5296e-05, 6.6796e-05, 6.5458e-05,
-6.5342e-05, -6.5205e-05, -6.5350e-05, 6.5402e-05, 6.5721e-05,
6.5457e-05, -6.5321e-05, -6.5283e-05, -6.5280e-05, 6.5410e-05,
-6.5351e-05, -6.5342e-05, -6.5063e-05, 6.5461e-05, 6.5426e-05,
-6.5271e-05, 6.5417e-05, 6.5417e-05, 6.5462e-05, -6.5357e-05,
-6.5296e-05, 6.5392e-05, -6.5317e-05, -6.5294e-05, -6.5305e-05,
6.5454e-05, 6.5511e-05], requires_grad=True)

ptrblck · June 29, 2021, 6:32am

Could you explain your use case a bit more and post a (minimal) executable code snippet to reproduce this issue, please?

anoud_abdalziz · June 29, 2021, 8:49pm

i run REDWGAN model on cpu ,
the layer weight updates after 40 iteration was slow with adam optimizer , the weights of generator was updated slightly , it was initially :
this is the one of the layer weights

[[ 7.1380e-02, 1.3541e-02, -7.7020e-02],
[-3.7415e-02, -3.7045e-02, 3.1205e-02],
[ 8.8570e-03, -8.4177e-02, 5.3708e-03]]]]], requires_grad=True)
and after 40 iteration ,it was updated as

      [[ 7.1350e-02,  1.3510e-02, -7.7051e-02],
       [-3.7445e-02, -3.7075e-02,  3.1175e-02],
       [ 8.8266e-03, -8.4208e-02,  5.3404e-03]]]]], requires_grad=True)

and the two images,the input and denoised from REDWGAN are the same

ptrblck · June 29, 2021, 9:46pm

Based on your description, the weights are indeed being updated, so it doesn’t seem to be a issue of static (detached) weights.
If you are concerned about the gradient magnitude (i.e. the weight updates themselves), you could try to play around with some hyperparameters, such as increasing the learning rate etc.

anoud_abdalziz · June 30, 2021, 6:18am

I’ve already changed the learning rate.

nebu · August 21, 2023, 9:43am

in my case this didn’t help.