Hi, I wanted to ask if PyTorch considers gradient w.r.t target in it’s computation. Consider the following example:
Here the target as well as output both comes from the same network
net = torch.nn.Linear(2,2) input = torch.tensor([1.,0.]) out = net(input) target = net(torch.tensor([2.,2.])) loss = nn.functional.mse_loss(out,target) optimizer.zero_grad() loss.backward() optimizer.step()
Does the above code translates to the gradient update of 1 or 2?
If it translates to 2 then how should I implement so that the gradient update is as given in 1?
Basically I want to update network parameters considering the gradient wrt to both out as well as target.
Any help would be really appreciated.