Confusion about using .clone

clone can be used e.g. on activations, which should be passed to multiple modules, where each module might manipulate the activation in-place.
Here is a small example:

# Setup
module1 = nn.Sequential(
    nn.ReLU(inplace=True),
    nn.Linear(10, 1))

module2 = nn.Sequential(
    nn.Linear(10, 2))


torch.manual_seed(2809)
act = torch.randn(1, 10)
print(act)

# Wrong, since act will be modified inplace
out1 = module1(act)
print(act)
out2 = module2(act)

# Right
torch.manual_seed(2809)
act = torch.randn(1, 10)
print(act)

# Wrong, since act will be modified inplace
out1 = module1(act.clone())
print(act)
out2 = module2(act)

If you don’t clone the activation, the first module would apply the relu on it and the call to module2 would get the wrong tensor.

If act was created by previous operations (layers), Autograd will properly calculate all gradients.

4 Likes