clone
can be used e.g. on activations, which should be passed to multiple modules, where each module might manipulate the activation in-place.
Here is a small example:
# Setup
module1 = nn.Sequential(
nn.ReLU(inplace=True),
nn.Linear(10, 1))
module2 = nn.Sequential(
nn.Linear(10, 2))
torch.manual_seed(2809)
act = torch.randn(1, 10)
print(act)
# Wrong, since act will be modified inplace
out1 = module1(act)
print(act)
out2 = module2(act)
# Right
torch.manual_seed(2809)
act = torch.randn(1, 10)
print(act)
# Wrong, since act will be modified inplace
out1 = module1(act.clone())
print(act)
out2 = module2(act)
If you don’t clone
the activation, the first module would apply the relu on it and the call to module2
would get the wrong tensor.
If act
was created by previous operations (layers), Autograd will properly calculate all gradients.