Autograd won’t copy the gradients, but will properly backpropagate through all models up to the first parameter, which requires gradients.
E.g. you could also freeze all models and set requires_grad=True
for the input and will still get valid gradients for the input tensor.
The frozen parameters won’t get their .grad
attribute populated.