Say I have a RNN like model, which have a loss at every step:
for step in step_cnts: output = rnn(input, hidden) loss = loss + criteria(output, target)
When backward with
loss.backward(), will gradient accumulate at every step of the RNN?
And won’t this accumulating leads to gradient explode?
By the way, if I average the loss with
loss = loss / step_cnts, will the gradient be different from
the sum version?