Using a coefficient for loss

Hey, I’m a little new to pytorch and have been playing around. I noticed that after computing the loss, I am unable to affect the value of loss whatsoever. For eg, if say the loss value is 10, if I do (loss * 1.5).backward() or any number for that matter, it doesn’t seem to affect the weights any differently than when the loss is just 10. I did this with a very small network and dummy data and printed out the outputs from copies of network after backpropping and stepping different losses ( loss.backward() for the original and (loss*1.5).backward() for the copy ) and both of the outputs came out the same. While I did come across a post that mentioned something similar to this where it says the gradient along with the sign is what matters, I don’t understand how I would handle cases where I have a coefficient for loss as in many reinforcement learning algorithms. How can I successfully use a coefficient for the loss when I call backward? Or does (loss * 1.5).backward() actually work and is my understanding is wrong?

Hi Sainath!

It works fine.

Here is a quick example that shows both the gradient and the
optimizer step:

``````>>> import torch
>>> torch.__version__
'1.7.1'
>>> ta = torch.tensor ([1.0], requires_grad = True)
>>> tb = torch.tensor ([1.0], requires_grad = True)
>>> opta = torch.optim.SGD ([ta], lr = 0.1)
>>> optb = torch.optim.SGD ([tb], lr = 0.1)
>>> ta.backward()
tensor([1.])
>>> opta.step()
>>> ta
>>> (1.5 * tb).backward()
tensor([1.5000])
>>> optb.step()
>>> tb
``````

Best.

K. Frank

Hi! Thanks for the response!
It does work here. Correct me if I’m wrong but what I tried to attempt was:

• Define a simple neural network, instantiate it and make a copy of it
• have dummy data, which in this case was just a list [1,2,3,4,5] with the targets also being [1,2,3,4,5]
• define separate optimizers for the neural nets and use mse loss
With this what I attempted to find was the outputs of both networks after a single pass of the list and backprops where original was backpropped with loss.backward() and the copy was backpropped with (loss*100).backward().
My understanding is that these two networks should now produce different outputs due to different backprop values. But what I found was that both the networks produce the same output the second time.
I guess I must be going wrong somewhere

Hi Sainath!

Please post a small, complete, runnable script that shows this issue,
together with its output.

Best.

K. Frank

Hi! Here is my code:

``````import torch.nn as nn
import torch
import torch.optim as optim

class model(nn.Module):
def __init__(self):
super().__init__()
self.seq = nn.Sequential(nn.Linear(5,5), nn.Linear(5,5),nn.Linear(5,5))
def forward(self, x):
return self.seq(x)

x_train = torch.tensor([1,2,3,4,5], dtype = torch.float)
y_train = torch.tensor([1,2,3,4,5], dtype = torch.float)

m1 = model()
m2 = model()
criterion = nn.MSELoss()
o1 = optim.Adam(m1.parameters(), lr = 1e-3)
o2 = optim.Adam(m2.parameters(), lr = 1e-3)

for i in range(2):

out1 = m1(x_train)
out2 = m2(x_train)

print('out1:', out1)
print('out2:', out2)

loss1 = criterion(out1, y_train)
loss1.backward()
loss2 = criterion(out2, y_train)
(loss2 * 100).backward()

o1.step()
o2.step()
``````

And here is the output:

``````out1: tensor([ 0.1006, -1.0441,  1.1687, -0.7200, -0.5491], grad_fn=<AddBackward0>)
``````

I noticed that you used the SGD optimizer and I used Adam. But on switching to the SGD optimizer it works fine and the outputs are different.
Output with SGD optimizer:

``````out1: tensor([ 0.6368, -0.8041, -0.0553,  0.4925, -0.5171], grad_fn=<AddBackward0>)