Greetings everyone,
I am now to PyTorch and to this board, and I hope I can get a little help here and there
I just started doing the PyTorch 60 minutes Blitz tutorial, and I noticed that in one of their examples, after calling an SGD optimizer, the loss of a pretrained model did not decrease:
import numpy as np
import torch, torchvision
# %% Downloading stuff
model = torchvision.models.resnet18(pretrained=True)
# %% Rest
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)
predic = model(data)
loss = (predic - labels).sum()
print(loss) # loss negative
loss.backward()
optim = torch.optim.SGD(model.parameters(), lr=1e-3, momentum=0.9)
optim.step()
predic = model(data)
loss = (predic - labels).sum()
print(loss) # loss even more negative, i.e. larger abs value
This was basically done here:
https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
Should the absolute loss not decrease after a single optimization step when having only one data point? Does it have to do with the fact that the model is pretrained?
Best,
PhysicsIsFun