First I thought, there is no update after the loss calculation and update step but the follwing code gives me false
which prove there is an update:
before = list(model.parameters())[0].clone()
loss.backward()
optimizer.step()
after = list(model.parameters())[0].clone()
logging.info(torch.equal(before.data, after.data))
Then I wanted to see the magnitude of chage by the following code:
list(model.parameters())[0].grad
Which give me very small amaount of change:
tensor([[ 3.2294e-07, 7.3983e-06, 6.5637e-06, ..., -1.3529e-06,
2019-11-16T17:21:03.747068480Z -4.9979e-06, 4.9799e-06],
2019-11-16T17:21:03.747074966Z [ 5.5463e-08, 3.0771e-06, -9.5087e-07, ..., 1.9265e-06,
2019-11-16T17:21:03.747080687Z -4.5251e-06, -7.1564e-07].....]
This is part of the output which shows how small they are.
Here is my model :
self.lstm_hidden_dim = lstm_hidden_dim
self.lstm = nn.LSTM(word_embedding_dim,lstm_hidden_dim, num_layers=1,batch_first=False,bidirectional=True)
self.firstHidden = nn.Linear(lstm_hidden_dim*2, 300)
self.relu=nn.ReLU()
self.secondlinear=nn.Linear(300, score_space_size)
self.softmax= nn.LogSoftmax(dim=1)
Loss function and optimizer:
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
Any idea, why the grad is very small?