How to prevent GRU loss going to NaN?

danielg00 · December 2, 2017, 8:43pm

In my project I want to map sentences(with word embeddings of size 100) to a vector of size 1536.
I’ve a GRU model:
self.gru = nn.GRU(100, 900, 3).cuda(); self.gru2 = nn.GRU(900, 1536, 1).cuda()

My problem is that my loss after around 20 iterations prints NaN or (in the rare case) stays constant.
What makes it print NaN? I can’t imagine it’s the loss getting to big as it jumps from 20,000 to NaN.
My batches are of size (68, 45, 100) and initialized my hidden states with a uniform dist between [1, 0]. I’ve varied my learning rate, batch size, optimizer, gradient clipping values and cost function.
This is my training code;
`for i in range(epochs):
data, targets = aVar(tr.FloatTensor(load_batch())).cuda()
preds, model.h, model.h2 = model.predict(data, h, h2)

model.zero_grad()
loss = cost(preds, targets)
loss.backward()
tr.nn.utils.clip_grad_norm(model.parameters(), g_clip)

if isnan(loss.data[0]):
    exit('Value is nan')
opt.step()
print(loss.data[0], i)`

Edit:
On further inspections it seems teh min and max values of my hidden states and outputs of the cells all go to +/- 3.4e-38