Your use case should generally work as the target usually doesn’t require a gradient as seen here:
model = nn.Linear(10, 2)
x = torch.randn(1, 10)
pred = model(x)
with torch.no_grad():
pred_new = model(torch.randn(1, 10))
loss = F.mse_loss(pred, pred_new)
loss.backward()
print(model.weight.grad)
With that being said, I don’t know how your model is defined etc. and which part of the code is raising the error.
The difference between the example you show is that, in my second call to the model the prediction (model output) from the first call is used as an input without using detach .
Thanks for the code! The error is expected as none of the tensors is attached to a computation graph:
loss = F.mse_loss(gt, pred_new)
gt is the target tensor created directly via gt = torch.randn(1, 10) while pred_new is created in the no_grad() context so also not attached to a computation graph.
Could you explain which gradients the backward() call should calculate?
I have grad_fn for the operations from inp1 and inp2 to the output pred. I want the gradients to computed for that part of the code but with respect to the loss calculated with the mse_loss.
so the parameters of the model can be updated with the step function of the optimizer.
I feel one way I could achieve this is by writing the code like this. Could you verify the effect of the deepcopy from copy? What happens when the model is loaded on the GPU?
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import SGD
import copy
class Model(nn.Module):
def __init__(self, grayscale=False):
super().__init__()
self.inp1 = nn.Linear(10, 2)
self.inp2 = nn.Linear(10,2)
self.out = nn.Linear(4, 10)
def forward(self, im1, im2):
im1 = self.inp1(im1)
im2 = self.inp2(im2)
return self.out(torch.concat([im1, im2], axis=1))
model = Model()
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)
loss_model = copy.deepcopy(model)
inp1 = torch.randn(1, 10)
inp2 = torch.randn(1, 10)
gt = torch.randn(1, 10)
pred = model(inp1, inp2)
loss_model = copy.deepcopy(model)
pred_new = loss_model(pred, inp1)
loss = F.mse_loss(inp1, pred_new)
loss.backward()
optimizer.step()
Also the memory footprint of the training is super high due to the copy of the model with all its parameters. Is their an elegant method to prevent the copy of the model to achieve the same results?