Hi all,
I am trying to understand the source code of torch.optim.sgd (link to source). As it inherits several features from the torch.optim.optimizer class (link to source), I am also taking glances at that. For the beginning, I have two questions:
-
In the constructor of optimizer, it says
self.state = defaultdict(dict)
. As far as I know,dict
has to be a function that is called when trying to access a key in the state dictionary that is not present. Where is thedict
function defined? -
The method in the SGD source code:
def __setstate__(self, state):
super(SGD, self).__setstate__(state)
for group in self.param_groups:
group.setdefault('nesterov', False)
It calls __setstate__
from the optimizer parent which simply does self.__dict__.update(state)
. So I assume this is to load an optimizer with a previous setting? Then what is the purpose of setting all the ānesterovā arguments to False?
Best,
PiF