Tensor size mismatch

I am using a modified predict.py for testing a pruned SqueezeNet Model

@Amrit_Das


[phung@archlinux SqueezeNet-Pruning]$ python predict.py --image 3_100.jpg --model model_prunned --num_class 2
/usr/lib/python3.7/site-packages/torchvision/models/squeezenet.py:94: UserWarning: nn.init.kaiming_uniform is now deprecated in favor of nn.init.kaiming_uniform_.
  init.kaiming_uniform(m.weight.data)
/usr/lib/python3.7/site-packages/torchvision/models/squeezenet.py:92: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal_.
  init.normal(m.weight.data, mean=0.0, std=0.01)
Traceback (most recent call last):
  File "predict.py", line 30, in <module>
    model.load_state_dict(checkpoint)
  File "/usr/lib/python3.7/site-packages/torch/nn/modules/module.py", line 723, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ModifiedSqueezeNetModel:
	size mismatch for features.0.weight: copying a param of torch.Size([28, 3, 3, 3]) from checkpoint, where the shape is torch.Size([64, 3, 3, 3]) in current model.
	size mismatch for features.0.bias: copying a param of torch.Size([28]) from checkpoint, where the shape is torch.Size([64]) in current model.
	size mismatch for features.3.squeeze.weight: copying a param of torch.Size([16, 28, 1, 1]) from checkpoint, where the shape is torch.Size([16, 64, 1, 1]) in current model.
	size mismatch for features.3.expand1x1.weight: copying a param of torch.Size([43, 16, 1, 1]) from checkpoint, where the shape is torch.Size([64, 16, 1, 1]) in current model.
	size mismatch for features.3.expand1x1.bias: copying a param of torch.Size([43]) from checkpoint, where the shape is torch.Size([64]) in current model.
	size mismatch for features.3.expand3x3.weight: copying a param of torch.Size([41, 16, 3, 3]) from checkpoint, where the shape is torch.Size([64, 16, 3, 3]) in current model.
	size mismatch for features.3.expand3x3.bias: copying a param of torch.Size([41]) from checkpoint, where the shape is torch.Size([64]) in current model.
	size mismatch for features.4.squeeze.weight: copying a param of torch.Size([16, 84, 1, 1]) from checkpoint, where the shape is torch.Size([16, 128, 1, 1]) in current model.
	size mismatch for features.4.expand1x1.weight: copying a param of torch.Size([38, 16, 1, 1]) from checkpoint, where the shape is torch.Size([64, 16, 1, 1]) in current model.
	size mismatch for features.4.expand1x1.bias: copying a param of torch.Size([38]) from checkpoint, where the shape is torch.Size([64]) in current model.
	size mismatch for features.4.expand3x3.weight: copying a param of torch.Size([29, 16, 3, 3]) from checkpoint, where the shape is torch.Size([64, 16, 3, 3]) in current model.
	size mismatch for features.4.expand3x3.bias: copying a param of torch.Size([29]) from checkpoint, where the shape is torch.Size([64]) in current model.
	size mismatch for features.6.squeeze.weight: copying a param of torch.Size([32, 67, 1, 1]) from checkpoint, where the shape is torch.Size([32, 128, 1, 1]) in current model.
	size mismatch for features.6.expand1x1.weight: copying a param of torch.Size([79, 32, 1, 1]) from checkpoint, where the shape is torch.Size([128, 32, 1, 1]) in current model.
	size mismatch for features.6.expand1x1.bias: copying a param of torch.Size([79]) from checkpoint, where the shape is torch.Size([128]) in current model.
	size mismatch for features.6.expand3x3.weight: copying a param of torch.Size([65, 32, 3, 3]) from checkpoint, where the shape is torch.Size([128, 32, 3, 3]) in current model.
	size mismatch for features.6.expand3x3.bias: copying a param of torch.Size([65]) from checkpoint, where the shape is torch.Size([128]) in current model.
	size mismatch for features.7.squeeze.weight: copying a param of torch.Size([32, 144, 1, 1]) from checkpoint, where the shape is torch.Size([32, 256, 1, 1]) in current model.
	size mismatch for features.7.expand1x1.weight: copying a param of torch.Size([80, 32, 1, 1]) from checkpoint, where the shape is torch.Size([128, 32, 1, 1]) in current model.
	size mismatch for features.7.expand1x1.bias: copying a param of torch.Size([80]) from checkpoint, where the shape is torch.Size([128]) in current model.
	size mismatch for features.7.expand3x3.weight: copying a param of torch.Size([53, 32, 3, 3]) from checkpoint, where the shape is torch.Size([128, 32, 3, 3]) in current model.
	size mismatch for features.7.expand3x3.bias: copying a param of torch.Size([53]) from checkpoint, where the shape is torch.Size([128]) in current model.
	size mismatch for features.9.squeeze.weight: copying a param of torch.Size([48, 133, 1, 1]) from checkpoint, where the shape is torch.Size([48, 256, 1, 1]) in current model.
	size mismatch for features.9.expand1x1.weight: copying a param of torch.Size([84, 48, 1, 1]) from checkpoint, where the shape is torch.Size([192, 48, 1, 1]) in current model.
	size mismatch for features.9.expand1x1.bias: copying a param of torch.Size([84]) from checkpoint, where the shape is torch.Size([192]) in current model.
	size mismatch for features.9.expand3x3.weight: copying a param of torch.Size([83, 48, 3, 3]) from checkpoint, where the shape is torch.Size([192, 48, 3, 3]) in current model.
	size mismatch for features.9.expand3x3.bias: copying a param of torch.Size([83]) from checkpoint, where the shape is torch.Size([192]) in current model.
	size mismatch for features.10.squeeze.weight: copying a param of torch.Size([48, 167, 1, 1]) from checkpoint, where the shape is torch.Size([48, 384, 1, 1]) in current model.
	size mismatch for features.10.expand1x1.weight: copying a param of torch.Size([82, 48, 1, 1]) from checkpoint, where the shape is torch.Size([192, 48, 1, 1]) in current model.
	size mismatch for features.10.expand1x1.bias: copying a param of torch.Size([82]) from checkpoint, where the shape is torch.Size([192]) in current model.
	size mismatch for features.10.expand3x3.weight: copying a param of torch.Size([81, 48, 3, 3]) from checkpoint, where the shape is torch.Size([192, 48, 3, 3]) in current model.
	size mismatch for features.10.expand3x3.bias: copying a param of torch.Size([81]) from checkpoint, where the shape is torch.Size([192]) in current model.
	size mismatch for features.11.squeeze.weight: copying a param of torch.Size([64, 163, 1, 1]) from checkpoint, where the shape is torch.Size([64, 384, 1, 1]) in current model.
	size mismatch for features.11.expand1x1.weight: copying a param of torch.Size([76, 64, 1, 1]) from checkpoint, where the shape is torch.Size([256, 64, 1, 1]) in current model.
	size mismatch for features.11.expand1x1.bias: copying a param of torch.Size([76]) from checkpoint, where the shape is torch.Size([256]) in current model.
	size mismatch for features.11.expand3x3.weight: copying a param of torch.Size([68, 64, 3, 3]) from checkpoint, where the shape is torch.Size([256, 64, 3, 3]) in current model.
	size mismatch for features.11.expand3x3.bias: copying a param of torch.Size([68]) from checkpoint, where the shape is torch.Size([256]) in current model.
	size mismatch for features.12.squeeze.weight: copying a param of torch.Size([64, 144, 1, 1]) from checkpoint, where the shape is torch.Size([64, 512, 1, 1]) in current model.
	size mismatch for features.12.expand1x1.weight: copying a param of torch.Size([16, 64, 1, 1]) from checkpoint, where the shape is torch.Size([256, 64, 1, 1]) in current model.
	size mismatch for features.12.expand1x1.bias: copying a param of torch.Size([16]) from checkpoint, where the shape is torch.Size([256]) in current model.
	size mismatch for features.12.expand3x3.weight: copying a param of torch.Size([14, 64, 3, 3]) from checkpoint, where the shape is torch.Size([256, 64, 3, 3]) in current model.
	size mismatch for features.12.expand3x3.bias: copying a param of torch.Size([14]) from checkpoint, where the shape is torch.Size([256]) in current model.
	size mismatch for classifier.1.weight: copying a param of torch.Size([2, 30, 1, 1]) from checkpoint, where the shape is torch.Size([2, 512, 1, 1]) in current model.
[phung@archlinux SqueezeNet-Pruning]$
Pruned squeezenet model
ModifiedSqueezeNetModel(
  (classifier): Sequential(
    (0): Dropout(p=0.5)
    (1): Conv2d(30, 2, kernel_size=(1, 1), stride=(1, 1))
    (2): ReLU(inplace)
    (3): AvgPool2d(kernel_size=13, stride=1, padding=0)
  )
  (features): Sequential(
    (0): Conv2d(3, 28, kernel_size=(3, 3), stride=(2, 2))
    (1): ReLU(inplace)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (3): Fire(
      (squeeze): Conv2d(28, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 43, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 41, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (4): Fire(
      (squeeze): Conv2d(84, 16, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(16, 38, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(16, 29, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (6): Fire(
      (squeeze): Conv2d(67, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 79, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 65, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (7): Fire(
      (squeeze): Conv2d(144, 32, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(32, 80, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(32, 53, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (8): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=True)
    (9): Fire(
      (squeeze): Conv2d(133, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 84, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 83, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (10): Fire(
      (squeeze): Conv2d(167, 48, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(48, 82, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(48, 81, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (11): Fire(
      (squeeze): Conv2d(163, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 76, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 68, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
    (12): Fire(
      (squeeze): Conv2d(144, 64, kernel_size=(1, 1), stride=(1, 1))
      (squeeze_activation): ReLU(inplace)
      (expand1x1): Conv2d(64, 16, kernel_size=(1, 1), stride=(1, 1))
      (expand1x1_activation): ReLU(inplace)
      (expand3x3): Conv2d(64, 14, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (expand3x3_activation): ReLU(inplace)
    )
  )
)

Hi,

It looks like you’re trying to load pretrained weights into a model that has a different size than the original one.

Are you saying that the pruned SqueezeNet model (model_prunned) has a different tensor sizes compared to the original SqueezeNet model ?

If your answer to the above question is a yes, how do I go around this restriction in actual python coding (how to modify the model loading code segment in predict.py) ?

@Amrit_Das @albanD

Here’s a suggestion, try importing both the models, either print the model or calculate the summary of the model. Either case you will know wether the input/output parameters are same or different. Post the output here.

Incase they are different add another layer to your model and match the outputs.

You can check the model summary in the following ways:

from torchvision import models
model = models.vgg16()
print(model)

or

from torchvision import models
from torchsummary import summary

vgg = models.vgg16()
summary(vgg, (3, 224, 224))

3 is your inputs, 224,224 is your input dimensions

@Amrit_Das @albanD

See this diffcheck result between original SqueezeNet model (left) and pruned SqueezeNet model (right)

So, how do I go around this restriction (tensor size difference) in actual python coding (how to modify the model loading code segment in predict.py for the smaller, pruned SqueezeNet model) ?

As you can see in the diff, considering only the first layer, it takes an input with 3 channels and output a Tensor with 64 channels in the original one and 28 in yours.
This means that the weights of the first convolution are of size 64x3x3x3 in the original model and 28x3x3x3 in your case.
As you can see these two tensors have completely different number of elements.
So there is no way to copy one into the other, you would need to reduce the size of the original tensor to make it fit into your smaller one.
But then your first convolution will not do the same thing anymore and the output will be different. So the accuracy of the small model will be completely different from the accuracy of the original one.

As @albanD said, the two models are completely different so it would not be a good idea to find a way around or modify the architechture of the model. Rather it would be advisable to choose similiar kind of model.

@Amrit_Das what do you exactly mean by similar model ? I am doing SqueezeNet model pruning here, therefore there is not going to be any existing model that will fit my model_prunned 100% without any tensor size mismatch.

This is the problem: How am I going to load my model_prunned ? and one more problem, the pruning is not deterministic, therefore, manual crafting of the tensor size to fit the model_prunned will not be possible for every pruning process.

In that case, you will have to train your model from scratch. You cannot use a pretrained model to inialize it I’m afraid.

There must be some other ways.

Otherwise, what is the point of doing pruning ?

I guess the whole problem of prunning is finding a way :slight_smile:
I’m not really familiar with that topic though.

@Amrit_Das

Do you have any idea how to load a “pruned” model without invoking the tensor size mismatch error ?

Sorry but even I have never worked with pruning techniques

@Amrit_Das @albanD

I will try the second method of saving and loading model

https://pytorch.org/docs/master/notes/serialization.html

this will eliminate the use the_model = TheModelClass(*args, **kwargs) or the use of initializing a pruned model with pre-trained model

I will update if this works or not

Finally, the tensor size mismatch error disappeared at least for now.

However, I faced some CPU/GPU issue again for this line of code : input = Variable(image_tensor).cuda()

See below the errors before and after I append .cuda() to that line of code

[phung@archlinux SqueezeNet-Pruning]$ python predict.py --image 3_100.jpg --model model_prunned --num_class 2
prediction in progress
Traceback (most recent call last):
File “predict.py”, line 66, in
prediction = predict_image(imagepath)
File “predict.py”, line 50, in predict_image
output = model(input)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/home/phung/Documents/Grive/Personal/Coursera/Machine_Learning/pruning/Pruning-CNN/SqueezeNet-Pruning/finetune.py”, line 39, in forward
x = self.features(x)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/container.py”, line 92, in forward
input = module(input)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/usr/lib/python3.7/site-packages/torch/nn/modules/conv.py”, line 320, in forward
self.padding, self.dilation, self.groups)
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
[phung@archlinux SqueezeNet-Pruning]$ python predict.py --image 3_100.jpg --model model_prunned --num_class 2
prediction in progress
Traceback (most recent call last):
File “predict.py”, line 66, in
prediction = predict_image(imagepath)
File “predict.py”, line 52, in predict_image
index = output.data.numpy().argmax()
TypeError: can’t convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
[phung@archlinux SqueezeNet-Pruning]$

The error is because numpy doesnot support cuda. So you have to make a cpu copy first.

V_tensor = torch.rand(9).cuda()
np_array = V_tensor.cpu().numpy()

Have a look at this discussion, you’ll get to know whats the error!

In this current context, are there option of not copying the tensor data to host (cpu) memory ?

Please advise.

You can keep a pytorch Tensor that is gpu backed.
If you need a numpy array, you cannot as numpy does not support gpu.

Thanks, but I am still confused on how to modify the code to keep pytorch Tensor.

I suppose I shall open a different thread for this gpu/cpu backend isue since the current thread topic seems to be have solved at least for now. I will confirm the solution later once I am done with the code modification.