Model performance not chaging even if grad values changes

alit · June 30, 2020, 10:09am

Hello,

I defined a sentiment analysis model using lstm i usually do it with torchtext this this time i wanted to try it without. I used glove as embeddings and a 2 layers LSTM connected with a fc layer.

My problem is even if my gradients are changing the loss doesn’t change and I don’t understand why?
Model below:

glove_mat is a matrix with size vocabulary*300(size of an embedding)

If you have any idea to change this ?

vdw · July 1, 2020, 8:15am

How does you training loop for an epoch/batch look like, i.e. where you have to call something like model.zero_grad() and optimizer.step().

Also, in the definition of your LSTM:

self.lstm = nn.LSTM(300,hidden_dim, n_layers, batch_first=True)

you probably want to add user bidirectional parameter, Otherwise you LSTM is always unidirectional and might not match self.fc1 if bidirectional=True.

alit · July 22, 2020, 8:27am

Hey Chris,

Thanks for your help. I managed to make the network work by changing the optimizer i was using a SGD but the loss was not changing when i switched to an Adam optimizer after 10 epochs the loss start decreasing.

final_optimizer = optim.Adam(model.parameters(),
lr =LEARNING_RATE, # args.learning_rate - default is 5e-5, our notebook had 2e-5
eps = 1e-8 # args.adam_epsilon - default is 1e-8.
)
initial_optimizer = optim.SGD(model.parameters(), lr=1e-3, weight_decay=1e-6)

I still don’t understand the problem with SGD as SGD’s fluctuation, on the one hand, enables it to jump to new and potentially better local minima.