Model performance not chaging even if grad values changes


I defined a sentiment analysis model using lstm i usually do it with torchtext this this time i wanted to try it without. I used glove as embeddings and a 2 layers LSTM connected with a fc layer.

My problem is even if my gradients are changing the loss doesn’t change and I don’t understand why?
Model below:

glove_mat is a matrix with size vocabulary*300(size of an embedding)

If you have any idea to change this ?

How does you training loop for an epoch/batch look like, i.e. where you have to call something like model.zero_grad() and optimizer.step().

Also, in the definition of your LSTM:

self.lstm = nn.LSTM(300,hidden_dim, n_layers, batch_first=True)

you probably want to add user bidirectional parameter, Otherwise you LSTM is always unidirectional and might not match self.fc1 if bidirectional=True.