Character level RNN does not converge in pytorch


I wrote a character level RNN (simple rnn, gru, lstm) with pytorch on tinyshakespeare dataset. Here is my Code. The cost did not decay very well. here is my output:

Epoch: 1 Train Loss: 3.3599359338933774 in: 113 sec
Epoch: 2 Train Loss: 3.3160508415915753 in: 103 sec
Epoch: 3 Train Loss: 3.3149012392217463 in: 107 sec
Epoch: 4 Train Loss: 3.3135951562361283 in: 109 sec
Epoch: 5 Train Loss: 3.3133712205019874 in: 104 sec

It seems that the gradient vanishing has happened! Does anyone can help me to solve this problem?

Can anyone help me please?

Usual approach that I would follow to debug stuff is:

  • drop the learning rate a bunch, and
  • use some really simple data initially, maybe just one or two examples
  • => you should be able to overfit the simple data really easily

I made a youtube video about overfitting on a sequence of integers, not sure if that is useful?

Try a simple dataset like sine-waves and see if it is converging. Otherwise, you might have to rethink your architecture.

T = 20
L = 1000
N = 100

x = np.empty((N, L), 'int64')
x[:] = np.array(range(L)) + np.random.randint(-4 * T, 4 * T, N).reshape(N, 1)
data = np.sin(x / 1.0 / T).astype('float64')