I wrote a character level RNN (simple
lstm) with pytorch on
tinyshakespeare dataset. Here is my Code. The cost did not decay very well. here is my output:
Epoch: 1 Train Loss: 3.3599359338933774 in: 113 sec
Epoch: 2 Train Loss: 3.3160508415915753 in: 103 sec
Epoch: 3 Train Loss: 3.3149012392217463 in: 107 sec
Epoch: 4 Train Loss: 3.3135951562361283 in: 109 sec
Epoch: 5 Train Loss: 3.3133712205019874 in: 104 sec
It seems that the gradient vanishing has happened! Does anyone can help me to solve this problem?
Can anyone help me please?
Usual approach that I would follow to debug stuff is:
- drop the learning rate a bunch, and
- use some really simple data initially, maybe just one or two examples
- => you should be able to overfit the simple data really easily
I made a youtube video about overfitting on a sequence of integers, not sure if that is useful?
Try a simple dataset like sine-waves and see if it is converging. Otherwise, you might have to rethink your architecture.
T = 20
L = 1000
N = 100
x = np.empty((N, L), 'int64')
x[:] = np.array(range(L)) + np.random.randint(-4 * T, 4 * T, N).reshape(N, 1)
data = np.sin(x / 1.0 / T).astype('float64')