Using a coefficient for loss

Hey, I’m a little new to pytorch and have been playing around. I noticed that after computing the loss, I am unable to affect the value of loss whatsoever. For eg, if say the loss value is 10, if I do (loss * 1.5).backward() or any number for that matter, it doesn’t seem to affect the weights any differently than when the loss is just 10. I did this with a very small network and dummy data and printed out the outputs from copies of network after backpropping and stepping different losses ( loss.backward() for the original and (loss*1.5).backward() for the copy ) and both of the outputs came out the same. While I did come across a post that mentioned something similar to this where it says the gradient along with the sign is what matters, I don’t understand how I would handle cases where I have a coefficient for loss as in many reinforcement learning algorithms. How can I successfully use a coefficient for the loss when I call backward? Or does (loss * 1.5).backward() actually work and is my understanding is wrong?

Hi Sainath!

It works fine.

Here is a quick example that shows both the gradient and the
optimizer step:

>>> import torch
>>> torch.__version__
>>> ta = torch.tensor ([1.0], requires_grad = True)
>>> tb = torch.tensor ([1.0], requires_grad = True)
>>> opta = torch.optim.SGD ([ta], lr = 0.1)
>>> optb = torch.optim.SGD ([tb], lr = 0.1)
>>> ta.backward()
>>> ta.grad
>>> opta.step()
>>> ta
tensor([0.9000], requires_grad=True)
>>> (1.5 * tb).backward()
>>> tb.grad
>>> optb.step()
>>> tb
tensor([0.8500], requires_grad=True)


K. Frank

Hi! Thanks for the response!
It does work here. Correct me if I’m wrong but what I tried to attempt was:

  • Define a simple neural network, instantiate it and make a copy of it
  • have dummy data, which in this case was just a list [1,2,3,4,5] with the targets also being [1,2,3,4,5]
  • define separate optimizers for the neural nets and use mse loss
    With this what I attempted to find was the outputs of both networks after a single pass of the list and backprops where original was backpropped with loss.backward() and the copy was backpropped with (loss*100).backward().
    My understanding is that these two networks should now produce different outputs due to different backprop values. But what I found was that both the networks produce the same output the second time.
    I guess I must be going wrong somewhere

Hi Sainath!

Please post a small, complete, runnable script that shows this issue,
together with its output.


K. Frank

Hi! Here is my code:

import torch.nn as nn
import torch
import torch.optim as optim

class model(nn.Module):
    def __init__(self):
        self.seq = nn.Sequential(nn.Linear(5,5), nn.Linear(5,5),nn.Linear(5,5))
    def forward(self, x):
        return self.seq(x)

x_train = torch.tensor([1,2,3,4,5], dtype = torch.float)
y_train = torch.tensor([1,2,3,4,5], dtype = torch.float)

m1 = model()
m2 = model()
criterion = nn.MSELoss()
o1 = optim.Adam(m1.parameters(), lr = 1e-3)
o2 = optim.Adam(m2.parameters(), lr = 1e-3)

for i in range(2):

    out1 = m1(x_train)
    out2 = m2(x_train)

    print('out1:', out1)
    print('out2:', out2)

    loss1 = criterion(out1, y_train)
    loss2 = criterion(out2, y_train)
    (loss2 * 100).backward()


And here is the output:

out1: tensor([ 0.1006, -1.0441,  1.1687, -0.7200, -0.5491], grad_fn=<AddBackward0>)
out2: tensor([ 0.1006, -1.0441,  1.1687, -0.7200, -0.5491], grad_fn=<AddBackward0>)
out1: tensor([ 0.1065, -1.0204,  1.1568, -0.7082, -0.5328], grad_fn=<AddBackward0>)
out2: tensor([ 0.1065, -1.0204,  1.1568, -0.7082, -0.5328], grad_fn=<AddBackward0>)

I noticed that you used the SGD optimizer and I used Adam. But on switching to the SGD optimizer it works fine and the outputs are different.
Output with SGD optimizer:

out1: tensor([ 0.6368, -0.8041, -0.0553,  0.4925, -0.5171], grad_fn=<AddBackward0>)
out2: tensor([ 0.6368, -0.8041, -0.0553,  0.4925, -0.5171], grad_fn=<AddBackward0>)
out1: tensor([ 0.6088, -0.7789, -0.0402,  0.4936, -0.4706], grad_fn=<AddBackward0>)
out2: tensor([-2.0355, -0.3274, -0.3055, -1.2581,  1.1941], grad_fn=<AddBackward0>)

Switching from Adam to SGD seems to bring about the necessary change but I do not know why. Is this supposed to happen with adam?

Oh okay I think I got it. It was my fault indeed. 100 in (loss * 100).backward() did not work and changing it to a much smaller number such as 0.0001 does work. Sorry! Though I’m not really sure why, guess I should do a thorough read on Adam.