Load state_dict from old pytorch version

kkjh0723 · June 2, 2020, 11:54am

I recently updated my pytorch to v1.5 and try to load trained weights from previous version.
I got the following error.

RuntimeError: Error(s) in loading state_dict for DistributedDataParallel:
	Missing key(s) in state_dict: "module.total_ops", "module.total_params", "module.layer1.total_ops", "module.layer1.total_params", "module.layer1.0.total_ops", "module.layer1.0.total_params", "module.layer1.0.downsample.total_ops", "module.layer1.0.downsample.total_params", "module.layer1.1.total_ops", "module.layer1.1.total_params", "module.layer1.2.total_ops", "module.layer1.2.total_params", "module.layer2.total_ops", "module.layer2.total_params", "module.layer2.0.total_ops", "module.layer2.0.total_params", "module.layer2.0.downsample.total_ops", "module.layer2.0.downsample.total_params", "module.layer2.1.total_ops", "module.layer2.1.total_params", "module.layer2.2.total_ops", "module.layer2.2.total_params", "module.layer2.3.total_ops", "module.layer2.3.total_params", "module.layer3.total_ops", "module.layer3.total_params", "module.layer3.0.total_ops", "module.layer3.0.total_params", "module.layer3.0.conv1.total_ops", "module.layer3.0.conv1.total_params", "module.layer3.0.downsample.total_ops", "module.layer3.0.downsample.total_params", "module.layer3.1.total_ops", "module.layer3.1.total_params", "module.layer3.1.conv1.total_ops", "module.layer3.1.conv1.total_params", "module.layer3.2.total_ops", "module.layer3.2.total_params", "module.layer3.2.conv1.total_ops", "module.layer3.2.conv1.total_params", "module.layer3.3.total_ops", "module.layer3.3.total_params", "module.layer3.3.conv1.total_ops", "module.layer3.3.conv1.total_params", "module.layer3.4.total_ops", "module.layer3.4.total_params", "module.layer3.4.conv1.total_ops", "module.layer3.4.conv1.total_params", "module.layer3.5.total_ops", "module.layer3.5.total_params", "module.layer3.5.conv1.total_ops", "module.layer3.5.conv1.total_params", "module.layer4.total_ops", "module.layer4.total_params", "module.layer4.0.total_ops", "module.layer4.0.total_params", "module.layer4.0.conv1.total_ops", "module.layer4.0.conv1.total_params", "module.layer4.0.downsample.total_ops", "module.layer4.0.downsample.total_params", "module.layer4.1.total_ops", "module.layer4.1.total_params", "module.layer4.1.conv1.total_ops", "module.layer4.1.conv1.total_params", "module.layer4.2.total_ops", "module.layer4.2.total_params", "module.layer4.2.conv1.total_ops", "module.layer4.2.conv1.total_params"

It seems each module has new states total_params and total_ops.
Which value supposed to be in those states?
Is there any best practice to load state_dict from previous version?
If I change strict=False for the load_state_dict, it runs without error.
But I cannot check whether other parameters are properly loaded or not.

ptrblck · June 3, 2020, 7:21am

Did you add these parameters manually to your model, as I cannot find them in the DistributedDataParallel implementation?

kkjh0723 · June 3, 2020, 7:44am

No, I didn’t. I thought those are newly added in pytorch version 1.5.
Then, it might be added by thop package (link) because it counts ops and params.
But I don’t know why those values are added as a member variable in each module in pytorch 1.5 and not in 1.1.

I found that it happens when I call thop.profile before loading state_dict.
I could solve it by just moving thop.profile after the loading.