Applying mask caused NaN grad

I’m having the same problem when I’m trying to implement variational dropout, when using the same mask over and over again.
Tried mask.detach() and also Variable(mask, requires_grad=False). Even tried to clone the mask every forward pass - Still getting nans after few iterations…
Check my post for more information… implementing-variational-dropout-cause-nan-values.
Thanks