-
The
state
attribute in the opt. class is adefaultdict
which helps to add new parameter states in thestate
dict as they are being used while training. But once you save the optimizerstate_dict
and load it back, thestate
attribute is adict
and not adefaultdict
which assumes that before saving, all the parameters in the network are present in thestate
dict.
I believe, when loading the opt. using a savedstate_dict
thestate
attribute should be adefaultdict
. Such behavior was also noticed by this post. -
In the optimizer’s
param
key of theparam_groups
the order of the parameters (in which they were given to the optimizer’s init) matters.
Inload_state_dict
the snippet shows this :
id_map = {old_id: p for old_id, p in zip(chain(*(g['params'] for g in saved_groups)), chain(*(g['params'] for g in groups)))} state = {id_map.get(k, k): v for k, v in state_dict['state'].items()}
Now consider model (when using, say, Adam optimizer)
class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.p1 = nn.Linear(2,3, False) self.p2 = nn.Linear(3,4, False)
Now after saving, if the order in which the parameters are defined in the model changes i.e. if I change the class to have
self.p2 = nn.Linear(3,4, False) self.p1 = nn.Linear(2,3, False)
the loaded optimizer’s state
for p1
will be mapped to p2
and vice-versa. I tried this and this indeed happens which is wrong and now training cannot proceed (step()
will, rightly so, give an error).
The nn.Module class is robust to such behavior as it uses parameter names instead of id
ordering.
Shouldn’t the optimizer also use parameter names instead of ids and relying on the ordering in which they are supplied to the optimizer when initializing?