hello, I am experimenting with different optims and I found that momentum in SGD has no effect on results
I am working in fully reproducible environment (all seeds set)
this is how I create SGD optimizer:
optimizer = optim.SGD(model.parameters(), lr = lr / 30)
this is how I can create one with momentum:
optimizer = optim.SGD(model.parameters(), lr = lr / 30, momentum = 0.9)
then I tried this one too:
optimizer = optim.SGD(model.parameters(), lr = lr / 30, momentum = 0.5)
all results (losses and scores) are same in each case
now I tried this one:
optimizer = optim.SGD(model.parameters(), lr = lr / 30, nesterov = True, momentum = 0.9)
and finally I see different results
does it mean by default momentum is unused? I tried to analyse code and looks like it is 0 by default and it should be used, any tips?
torch version ‘1.5.1’