Change Loss Value for AutoGrad

actang · February 20, 2018, 10:18pm

I was trying to do backpropagation with a different loss value. The computation graph is still the same, the only difference is that I want to backpropagate a different loss value. I created this loss_1 which has the right computation graph but with a different loss value. In this simply linear regression setting y=ax + b, the gradient on a is different for the model_1. What I want is 2 * (y_2 - y) * x, which is different from model_1.linear1.weight.grad. Looks to me that model_1.linear1.weight.grad is the same as 2 * (y_1 - y) * x. It seems that changing the loss value directly doesn’t change the gradients at all. The loss calculation is still using the old loss. Is there anyway to get around with this?

import torch
from torch.autograd import Variable

torch.manual_seed(10)
dtype = torch.FloatTensor

class TestNet(torch.nn.Module):
    def __init__(self, D_in, D_out):
        super(TestNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, D_out)

    def forward(self, x):
        return self.linear1(x)

N, D_in, D_out = 1, 1, 1

x = Variable(torch.randn(N, D_in), requires_grad=True)
y = Variable(torch.randn(N, D_out), requires_grad=False)

model_1 = TestNet(D_in, H, D_out)
model_2 = TestNet(D_in, H, D_out)

criterion = torch.nn.MSELoss(size_average=False)
optimizer_1 = torch.optim.SGD(model_1.parameters(), lr=1e-4)
optimizer_2 = torch.optim.SGD(model_2.parameters(), lr=1e-4)

x.grad.data.zero_()
y_2 = model_2(x)
loss_2 = criterion(y_2, y)
optimizer_2.zero_grad()
loss_2.backward()
x_grad_2 = x.grad.clone()

y_1 = model_1(x)
loss_1 = criterion(y_1, y)
loss_1.data.fill_(loss_2.data[0])
optimizer_1.zero_grad()
x.grad.data.zero_()
loss_1.backward()
x_grad_1 = x.grad.clone()

jpeg729 · February 21, 2018, 8:30am

A few remarks…

If x is simply your input data, then it doesn’t need requires_grad=True.
You use the same loss function for model_1 and model_2, but not the same optimizer.
model_1 and model_2 have different weights, so the gradients they produce should be different.

I am really confused. Can restate your question with a little more precision?