I want to be sure that weighting the loss in pytorch gets the same weights result as changing learning rate would.
The gradients attached to the weights will be different, but that’s fine.
This math works on paper, so I’m curious about the implementation.
In short, is Loss * 1
with lr = 2
the same as Loss * 2
with lr=1
. EDIT: and would that apply to any equivalent pair?
Here is two short bits of code that demonstrate the behavior that I want:
- With lr=2
import torch as t
inp = t.tensor([[.1,.2,.3],[.2,.3,.4],[.4,.5,.6]], requires_grad=True)
w1 = t.tensor([[.1,.2,.3],[.2,.3,.4],[.4,.5,.6]], requires_grad=True)
w2 = t.tensor([[.1,.2,.3],[.2,.3,.4],[.4,.5,.6]], requires_grad=True)
a = (inp*w1)
b = (a*w2)
L = b.sum()
L.backward()
opt = t.optim.SGD([w1,w2],lr=2)
opt.step()
print(w1,w2,sep='\n')
print(w1.grad,w2.grad,sep='\n')
With L*2
import torch as t
inp = t.tensor([[.1,.2,.3],[.2,.3,.4],[.4,.5,.6]], requires_grad=True)
w1 = t.tensor([[.1,.2,.3],[.2,.3,.4],[.4,.5,.6]], requires_grad=True)
w2 = t.tensor([[.1,.2,.3],[.2,.3,.4],[.4,.5,.6]], requires_grad=True)
a = (inp*w1)
b = (a*w2)
L = b.sum()
L = 2*L
L.backward()
opt = t.optim.SGD([w1,w2],lr=1)
opt.step()
print(w1,w2,sep='\n')
print(w1.grad,w2.grad,sep='\n')