Lower limit in learning rate in PyTorch torch.optim

I was going through the following information on reducing learning rates in PyTorch to really low value like 1e-9.

And I am amazed why doing loss = loss/100 is equivalent to reducing learning rate by 100? The full snippet is below.

outputs = model(batch)
loss = criterion(outputs, targets)

# Equivalent to lowering the learning rate by a factor of 100
loss = loss / 100

self.optimizer.zero_grad()
loss.backward()
self.optimizer.step()

Can someone please help me on this.

Thanks a lot

If you scale the loss, you’ll also scale the gradients.
In a simple use case, this can be used instead of changing the learning rate, as seen here:

# Setup
torch.manual_seed(2809)

lin = nn.Linear(2, 2, bias=False)
x = torch.randn(1, 2)

# Standard approach
out = lin(x)
loss = out.sum()
print(loss)
loss.backward()
print(lin.weight.grad)
> tensor(-0.8130, grad_fn=<SumBackward0>)
tensor([[-1.1281,  0.8386],
        [-1.1281,  0.8386]])

# loss scaling by x10
lin.zero_grad()
out = lin(x)
loss = out.sum() * 10
print(loss)
loss.backward()
print(lin.weight.grad)
> tensor(-8.1301, grad_fn=<MulBackward0>)
tensor([[-11.2812,   8.3855],
        [-11.2812,   8.3855]])

# loss scaling by x0.1
lin.zero_grad()
out = lin(x)
loss = out.sum() * 0.1
print(loss)
loss.backward()
print(lin.weight.grad)
> tensor(-0.0813, grad_fn=<MulBackward0>)
tensor([[-0.1128,  0.0839],
        [-0.1128,  0.0839]])

However, I would be careful with more advanced optimizers, which are e.g. tracking the running average of the gradients or are adjusting the learning rate internally somehow.

1 Like