Hey, I’m a little new to pytorch and have been playing around. I noticed that after computing the loss, I am unable to affect the value of loss whatsoever. For eg, if say the loss value is 10, if I do (loss * 1.5).backward() or any number for that matter, it doesn’t seem to affect the weights any differently than when the loss is just 10. I did this with a very small network and dummy data and printed out the outputs from copies of network after backpropping and stepping different losses ( loss.backward() for the original and (loss*1.5).backward() for the copy ) and both of the outputs came out the same. While I did come across a post that mentioned something similar to this where it says the gradient along with the sign is what matters, I don’t understand how I would handle cases where I have a coefficient for loss as in many reinforcement learning algorithms. How can I successfully use a coefficient for the loss when I call backward? Or does (loss * 1.5).backward() actually work and is my understanding is wrong?
It works fine.
Here is a quick example that shows both the gradient and the
>>> import torch >>> torch.__version__ '1.7.1' >>> ta = torch.tensor ([1.0], requires_grad = True) >>> tb = torch.tensor ([1.0], requires_grad = True) >>> opta = torch.optim.SGD ([ta], lr = 0.1) >>> optb = torch.optim.SGD ([tb], lr = 0.1) >>> ta.backward() >>> ta.grad tensor([1.]) >>> opta.step() >>> ta tensor([0.9000], requires_grad=True) >>> (1.5 * tb).backward() >>> tb.grad tensor([1.5000]) >>> optb.step() >>> tb tensor([0.8500], requires_grad=True)
Hi! Thanks for the response!
It does work here. Correct me if I’m wrong but what I tried to attempt was:
- Define a simple neural network, instantiate it and make a copy of it
- have dummy data, which in this case was just a list [1,2,3,4,5] with the targets also being [1,2,3,4,5]
- define separate optimizers for the neural nets and use mse loss
With this what I attempted to find was the outputs of both networks after a single pass of the list and backprops where original was backpropped with loss.backward() and the copy was backpropped with (loss*100).backward().
My understanding is that these two networks should now produce different outputs due to different backprop values. But what I found was that both the networks produce the same output the second time.
I guess I must be going wrong somewhere
Please post a small, complete, runnable script that shows this issue,
together with its output.
Hi! Here is my code:
import torch.nn as nn import torch import torch.optim as optim class model(nn.Module): def __init__(self): super().__init__() self.seq = nn.Sequential(nn.Linear(5,5), nn.Linear(5,5),nn.Linear(5,5)) def forward(self, x): return self.seq(x) x_train = torch.tensor([1,2,3,4,5], dtype = torch.float) y_train = torch.tensor([1,2,3,4,5], dtype = torch.float) m1 = model() m2 = model() m2.load_state_dict(m1.state_dict()) criterion = nn.MSELoss() o1 = optim.Adam(m1.parameters(), lr = 1e-3) o2 = optim.Adam(m2.parameters(), lr = 1e-3) for i in range(2): o1.zero_grad() o2.zero_grad() out1 = m1(x_train) out2 = m2(x_train) print('out1:', out1) print('out2:', out2) loss1 = criterion(out1, y_train) loss1.backward() loss2 = criterion(out2, y_train) (loss2 * 100).backward() o1.step() o2.step()
And here is the output:
out1: tensor([ 0.1006, -1.0441, 1.1687, -0.7200, -0.5491], grad_fn=<AddBackward0>) out2: tensor([ 0.1006, -1.0441, 1.1687, -0.7200, -0.5491], grad_fn=<AddBackward0>) out1: tensor([ 0.1065, -1.0204, 1.1568, -0.7082, -0.5328], grad_fn=<AddBackward0>) out2: tensor([ 0.1065, -1.0204, 1.1568, -0.7082, -0.5328], grad_fn=<AddBackward0>)
I noticed that you used the SGD optimizer and I used Adam. But on switching to the SGD optimizer it works fine and the outputs are different.
Output with SGD optimizer:
out1: tensor([ 0.6368, -0.8041, -0.0553, 0.4925, -0.5171], grad_fn=<AddBackward0>) out2: tensor([ 0.6368, -0.8041, -0.0553, 0.4925, -0.5171], grad_fn=<AddBackward0>) out1: tensor([ 0.6088, -0.7789, -0.0402, 0.4936, -0.4706], grad_fn=<AddBackward0>) out2: tensor([-2.0355, -0.3274, -0.3055, -1.2581, 1.1941], grad_fn=<AddBackward0>)
Switching from Adam to SGD seems to bring about the necessary change but I do not know why. Is this supposed to happen with adam?
Oh okay I think I got it. It was my fault indeed. 100 in (loss * 100).backward() did not work and changing it to a much smaller number such as 0.0001 does work. Sorry! Though I’m not really sure why, guess I should do a thorough read on Adam.