RuntimeError: Error(s) in loading state_dict for BNInception

zhunge · January 9, 2019, 1:31am

when I train tsn-pytorch use the following command:

python main.py ucf101 RGB /home/xu/Datasets/ucfTrainTestlist/trainlist01.txt /home/xu/Datasets/ucfTrainTestlist/testlist01.txt --arch BNInception --num_segments 3 --gd 20 --lr 0.001 --lr_steps 30 60 --epochs 80 -b 16 -j 2 --dropout 0.8 --snapshot_pref ucf101_bninception_

however, it show:

Initializing TSN with base model: BNInception.
TSN Configurations:
    input_modality:     RGB
    num_segments:       3
    new_length:         1
    consensus_module:   avg
    dropout_ratio:      0.8
        
Traceback (most recent call last):
  File "main.py", line 301, in <module>
    main()
  File "main.py", line 35, in main
    consensus_type=args.consensus_type, dropout=args.dropout, partial_bn=not args.no_partialbn)
  File "/home/xu/HAR/project/TSN/tsn-pytorch/models.py", line 39, in __init__
    self._prepare_base_model(base_model)
  File "/home/xu/HAR/project/TSN/tsn-pytorch/models.py", line 96, in _prepare_base_model
    self.base_model = getattr(tf_model_zoo, base_model)()
  File "/home/xu/HAR/project/TSN/tsn-pytorch/tf_model_zoo/bninception/pytorch_load.py", line 35, in __init__
    self.load_state_dict(torch.utils.model_zoo.load_url(weight_url))
  File "/home/xu/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for BNInception:
	size mismatch for conv1_7x7_s2_bn.weight: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([1, 64]) in current model.
	size mismatch for conv1_7x7_s2_bn.bias: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([1, 64]) in current model.
	size mismatch for conv1_7x7_s2_bn.running_mean: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([1, 64]) in current model.
	size mismatch for conv1_7x7_s2_bn.running_var: copying a param of torch.Size([64]) from checkpoint, where the shape is torch.Size([1, 64]) in current model.


...

size mismatch for inception_5b_pool_proj_bn.running_var: copying a param of torch.Size([128]) from checkpoint, where the shape is torch.Size([1, 128]) in current model.

I need help with this.help is appreciated
Regards

youyi-jia · November 20, 2019, 1:37pm

I encountered the same error when I ran the program. How can I solve this problem.

D:\Software\Anaconda3\envs\pytorch\python.exe D:/Python_project/SSD/demo/demo.py
Loading weights into state dict...
Traceback (most recent call last):
  File "D:/Python_project/SSD/demo/demo.py", line 22, in <module>
    net.load_weights('../weights/ssd300_mAP_77.43_v2.pth')
  File "D:\Python_project\SSD\ssd.py", line 121, in load_weights
    map_location=lambda storage, loc: storage))
  File "D:\Software\Anaconda3\envs\pytorch\lib\site-packages\torch\nn\modules\module.py", line 839, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for SSD:
	size mismatch for conf.0.weight: copying a param with shape torch.Size([4, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([84, 512, 3, 3]).
	size mismatch for conf.0.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([84]).
	size mismatch for conf.1.weight: copying a param with shape torch.Size([6, 1024, 3, 3]) from checkpoint, the shape in current model is torch.Size([126, 1024, 3, 3]).
	size mismatch for conf.1.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([126]).
	size mismatch for conf.2.weight: copying a param with shape torch.Size([6, 512, 3, 3]) from checkpoint, the shape in current model is torch.Size([126, 512, 3, 3]).
	size mismatch for conf.2.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([126]).
	size mismatch for conf.3.weight: copying a param with shape torch.Size([6, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([126, 256, 3, 3]).
	size mismatch for conf.3.bias: copying a param with shape torch.Size([6]) from checkpoint, the shape in current model is torch.Size([126]).
	size mismatch for conf.4.weight: copying a param with shape torch.Size([4, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([84, 256, 3, 3]).
	size mismatch for conf.4.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([84]).
	size mismatch for conf.5.weight: copying a param with shape torch.Size([4, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([84, 256, 3, 3]).
	size mismatch for conf.5.bias: copying a param with shape torch.Size([4]) from checkpoint, the shape in current model is torch.Size([84]).

Process finished with exit code 1

youyi-jia · November 21, 2019, 8:24am

I have solved this problem, which is actually caused by this line of code in the _load_from_state_dict function:

                if input_param.shape != param.shape:
                    # local shape should match the one in checkpoint
                    error_msgs.append('size mismatch for {}: copying a param with shape {} from checkpoint, '
                                      'the shape in current model is {}.'
                                      .format(key, input_param.shape, param.shape))

It just means the param shape in your checkpoint file([1,64]) is deferent from that in your model you define([64]). So what shoule you do is just modify the param of conv1_7x7_s2_bn layers or modify your checkpoint file. This is just my personal suggestioin , hope it help.