mderakhshani
(Mohammad Mehdi Derakhshani)
August 10, 2017, 10:08am
1
Hi,
I wrote a character level RNN (simple rnn
, gru
, lstm
) with pytorch on tinyshakespeare
dataset. Here is my Code . The cost did not decay very well. here is my output:
Epoch: 1 Train Loss: 3.3599359338933774 in: 113 sec
Epoch: 2 Train Loss: 3.3160508415915753 in: 103 sec
Epoch: 3 Train Loss: 3.3149012392217463 in: 107 sec
Epoch: 4 Train Loss: 3.3135951562361283 in: 109 sec
Epoch: 5 Train Loss: 3.3133712205019874 in: 104 sec
…
It seems that the gradient vanishing has happened! Does anyone can help me to solve this problem?
mderakhshani
(Mohammad Mehdi Derakhshani)
August 10, 2017, 1:06pm
2
Can anyone help me please?
Usual approach that I would follow to debug stuff is:
drop the learning rate a bunch, and
use some really simple data initially, maybe just one or two examples
=> you should be able to overfit the simple data really easily
I made a youtube video about overfitting on a sequence of integers, not sure if that is useful?
bibinmjose
(Bibin Mathew Jose)
December 31, 2018, 10:28am
4
Try a simple dataset like sine-waves and see if it is converging. Otherwise, you might have to rethink your architecture.
T = 20
L = 1000
N = 100
x = np.empty((N, L), 'int64')
x[:] = np.array(range(L)) + np.random.randint(-4 * T, 4 * T, N).reshape(N, 1)
data = np.sin(x / 1.0 / T).astype('float64')