Say I have a RNN like model, which have a loss at every step:
for step in step_cnts:
output = rnn(input, hidden)
loss = loss + criteria(output, target)
When backward with loss.backward()
, will gradient accumulate at every step of the RNN?
And won’t this accumulating leads to gradient explode?
By the way, if I average the loss with loss = loss / step_cnts
, will the gradient be different from
the sum version?