I’m trying to port some tensorflow code to pytorch where a whole tensor is dropped out occasionally. This is done by multiplying the whole tensor with zero.
If I do the same in pytorch, I get errors from autograd about an invalid inplace operation:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 32, 32, 32]], which is output 0 of LeakyReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Here is the original tensorfow code (“context” is the tensor):
maybe_dropout = tf.cast(tf.math.greater(tf.random.uniform([]), self._drop_out_rate),tf.float32)
context *= maybe_dropout
Here is my pytorch implementation:
maybe_dropout = (torch.rand([]) > self._drop_out_rate).type(torch.get_default_dtype())
context *= maybe_dropout
I also tried to simply set the content to zero by
context[:] = 0
but this resulted in NaN gradients.
The only thing that worked was detaching the tensor before setting it to zero:
context = context.detach()
context[:] = 0
but this kind of doesn’t seem right. What would be the right way to implement such a dropout behavior?