I’m trying to port some tensorflow code to pytorch where a whole tensor is dropped out occasionally. This is done by multiplying the whole tensor with zero.

If I do the same in pytorch, I get errors from autograd about an invalid inplace operation:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64, 32, 32, 32]], which is output 0 of LeakyReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Here is the original tensorfow code (“context” is the tensor):

```
maybe_dropout = tf.cast(tf.math.greater(tf.random.uniform([]), self._drop_out_rate),tf.float32)
context *= maybe_dropout
```

Here is my pytorch implementation:

```
maybe_dropout = (torch.rand([]) > self._drop_out_rate).type(torch.get_default_dtype())
context *= maybe_dropout
```

I also tried to simply set the content to zero by

```
context[:] = 0
```

but this resulted in NaN gradients.

The only thing that worked was detaching the tensor before setting it to zero:

```
context = context.detach()
context[:] = 0
```

but this kind of doesn’t seem right. What would be the right way to implement such a dropout behavior?