I’m having the same problem when I’m trying to implement variational dropout, when using the same mask over and over again.
Tried mask.detach() and also Variable(mask, requires_grad=False). Even tried to clone the mask every forward pass - Still getting nans after few iterations…
Check my post for more information… implementing-variational-dropout-cause-nan-values.
Thanks