Character level RNN does not converge in pytorch

(Mohammad Mehdi Derakhshani) #1


I wrote a character level RNN (simple rnn, gru, lstm) with pytorch on tinyshakespeare dataset. Here is my Code. The cost did not decay very well. here is my output:

Epoch: 1 Train Loss: 3.3599359338933774 in: 113 sec
Epoch: 2 Train Loss: 3.3160508415915753 in: 103 sec
Epoch: 3 Train Loss: 3.3149012392217463 in: 107 sec
Epoch: 4 Train Loss: 3.3135951562361283 in: 109 sec
Epoch: 5 Train Loss: 3.3133712205019874 in: 104 sec

It seems that the gradient vanishing has happened! Does anyone can help me to solve this problem?

(Mohammad Mehdi Derakhshani) #2

Can anyone help me please?

(Hugh Perkins) #3

Usual approach that I would follow to debug stuff is:

  • drop the learning rate a bunch, and
  • use some really simple data initially, maybe just one or two examples
  • => you should be able to overfit the simple data really easily

I made a youtube video about overfitting on a sequence of integers, not sure if that is useful?