No, I don’t think your approach works as intended.
The backward
call will not raise an error, since nablaU
is differentiable, but since it’s a newly created tensor in the forward
method it will never be updated.
Also, since you are detaching nabla_x
, the previously used parameters also shouldn’t get any gradients.
Run a single iteration as:
output = model(input)
loss = criterion(output, target)
loss.backward()
and check the .grad
attributes of the model’s parameters via:
for name, param in model.named_parameters():
print("param {}, grad {}".format(name, param.grad))
which should show None
(unless I’m missing how the output is still attached to the computation graph).
Do not call optimizer.zero_grad()
before the first backward
call as it will fill each .grad
attribute with zeros (in the default setup).