Possible Issues in Optimizer

  1. The state attribute in the opt. class is a defaultdict which helps to add new parameter states in the state dict as they are being used while training. But once you save the optimizer state_dict and load it back, the state attribute is a dict and not a defaultdict which assumes that before saving, all the parameters in the network are present in the state dict.
    I believe, when loading the opt. using a saved state_dict the state attribute should be a defaultdict. Such behavior was also noticed by this post.

  2. In the optimizer’s param key of the param_groups the order of the parameters (in which they were given to the optimizer’s init) matters.
    In load_state_dict the snippet shows this :
    id_map = {old_id: p for old_id, p in zip(chain(*(g['params'] for g in saved_groups)), chain(*(g['params'] for g in groups)))} state = {id_map.get(k, k): v for k, v in state_dict['state'].items()}

Now consider model (when using, say, Adam optimizer)
class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.p1 = nn.Linear(2,3, False) self.p2 = nn.Linear(3,4, False)

Now after saving, if the order in which the parameters are defined in the model changes i.e. if I change the class to have
self.p2 = nn.Linear(3,4, False) self.p1 = nn.Linear(2,3, False)
the loaded optimizer’s state for p1 will be mapped to p2 and vice-versa. I tried this and this indeed happens which is wrong and now training cannot proceed (step() will, rightly so, give an error).
The nn.Module class is robust to such behavior as it uses parameter names instead of id ordering.

Shouldn’t the optimizer also use parameter names instead of ids and relying on the ordering in which they are supplied to the optimizer when initializing?

this is totally worth fixing IMO, can you open an issue with the exact contents on https://github.com/pytorch/pytorch

Cool. I opened two issues, #1488 and #1489