RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 24 and 195 in dimension 0 at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/TH/generic/THTensor.cpp:689

I am trying to use Tracing mechanism to trace my python model. Below is my nn.Module:

class GeneralizedRCNN(nn.Module):
    """
    Main class for Generalized R-CNN. Currently supports boxes and masks.
    It consists of three main parts:
    - backbone
    - rpn
    - heads: takes the features + the proposals from the RPN and computes
        detections / masks from it.
    """

    def __init__(self, cfg):
        super(GeneralizedRCNN, self).__init__()

        self.backbone = build_backbone(cfg)
        self.rpn = build_rpn(cfg, self.backbone.out_channels)
        self.roi_heads = build_roi_heads(cfg, self.backbone.out_channels)

    def forward(self, images, targets=None):
        """
        Arguments:
            images (list[Tensor] or ImageList): images to be processed
            targets (list[BoxList]): ground-truth boxes present in the image (optional)

        Returns:
            result (list[BoxList] or dict[Tensor]): the output from the model.
                During training, it returns a dict[Tensor] which contains the losses.
                During testing, it returns list[BoxList] contains additional fields
                like `scores`, `labels` and `mask` (for Mask R-CNN models).

        """
        if self.training and targets is None:
            raise ValueError("In training mode, targets should be passed")
        images = to_image_list(images)
        features = self.backbone(images.tensors)
        proposals, proposal_losses = self.rpn(images, features, targets)
        if self.roi_heads:
            print(type(features))
            print(type(proposals))
            print(type(targets))
            x, result, detector_losses = self.roi_heads(features, proposals, targets)
        else:
            # RPN-only models don't have roi_heads
            x = features
            result = proposals
            detector_losses = {}

        if self.training:
            losses = {}
            losses.update(detector_losses)
            losses.update(proposal_losses)
            return losses

        print("result.....")
        print(result[0].bbox.size())
        print(result[0].bbox)
        return result[0].bbox

When I execute the file, it does print the last two lines as shown below:

result…
torch.Size([24, 4])
tensor([[0.0000e+00, 6.6122e+02, 4.0417e+01, 7.1402e+02],
[0.0000e+00, 7.7747e+02, 9.3248e+01, 7.9861e+02],
[1.9836e+02, 1.5207e+02, 3.1438e+02, 3.3421e+02],
[0.0000e+00, 1.1651e+02, 6.4819e+01, 3.5847e+02],
[1.5965e+02, 7.6236e+02, 3.4225e+02, 7.9900e+02],
[6.8618e+02, 1.1461e-01, 7.9371e+02, 3.6082e+01],
[0.0000e+00, 0.0000e+00, 5.6068e+01, 7.0202e+01],
[6.5861e+02, 7.8178e+01, 7.9319e+02, 3.5219e+02],
[1.3673e+02, 9.9877e+01, 3.1141e+02, 3.3045e+02],
[1.3768e+02, 2.6786e+02, 2.3104e+02, 3.3459e+02],
[6.5975e+02, 5.7610e+02, 6.7409e+02, 5.9613e+02],
[3.5417e+02, 3.3771e+02, 5.1339e+02, 4.6793e+02],
[6.6992e+02, 2.6569e+02, 7.8911e+02, 4.4689e+02],
[7.6212e+02, 7.6213e+02, 7.8742e+02, 7.8259e+02],
[1.3392e+02, 3.3078e+02, 3.3120e+02, 4.6792e+02],
[6.6482e+02, 6.3342e+02, 6.8739e+02, 6.5951e+02],
[3.7355e+02, 1.8197e+02, 6.2639e+02, 4.5210e+02],
[3.3505e+02, 9.4824e+01, 5.5984e+02, 2.3227e+02],
[4.9010e+02, 2.6308e+02, 6.2003e+02, 4.6466e+02],
[3.5421e+02, 2.6553e+02, 5.7457e+02, 4.8271e+02],
[6.8232e+02, 1.7231e+02, 7.9900e+02, 4.2151e+02],
[3.6076e+02, 1.1569e+02, 5.4964e+02, 3.8680e+02],
[5.8361e+02, 5.9198e+02, 6.5042e+02, 6.5802e+02],
[0.0000e+00, 3.4766e+02, 7.7981e+01, 4.8119e+02]])

The execution file named trace.py is as below:

print('loading...')
torch_model = build_detection_model(cfg.clone())
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
    map_location = None
state_dict = torch.load(model_path, map_location=torch.device("cpu"))
# print(state_dict)
load_state_dict(torch_model,state_dict.pop("model"))
torch_model.eval()
print('model loaded and state dict applied successfully')
im = loadImg(sample_input)
image = transforms(im)
image_list = to_image_list(image, cfg.DATALOADER.SIZE_DIVISIBILITY)
image_list = image_list.to('cpu')
print(type(image))
print(image.size())
y = torch.Tensor(1,3,800,800)
y[0] = image
print(y)
print('going in..')
traced_script_module = torch.jit.trace(torch_model, y)
print('done')

Following is the error that I receive when I execute the trace.py:
Traceback (most recent call last):
File “/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/jit/init.py”, line 595, in run_mod_and_filter_tensor_outputs
outs = wrap_retval(mod(*_clone_inputs(inputs)))
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 24 and 195 in dimension 0 at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/TH/generic/THTensor.cpp:689
The above operation failed in interpreter, with the following stack trace:
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/poolers.py(96): convert_to_roi_format
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/poolers.py(110): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_feature_extractors.py(60): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/mask_head/mask_head.py(70): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py(39): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py(56): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/jit/init.py(904): trace_module
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/jit/init.py(772): trace
trace.py(117):

Hi,

It looks like the error comes from the ROI pooling layer. So Would it be possible that the input you give when tracing is not the right size? You don’t specify the input sizes in your logs.

@albanD : I am passing “y” to the torch.jit.trace, y has size = torch.Size([1, 3, 800, 800])
and if I pass the “image” variable to jit.trace, then image has size = torch.Size([3, 800, 800])

Also, please keep noted that error is same for both cases, if I pass “y” or “image” to the jit.trace

P.S: if I run the code without tracing, it works fine and the input image has size torch.Size([3, 800, 800])

Hi,

I think the inputs is expected to be a tuple, so you should pass (y,) no? Not sure if that changes anything though.
Have you checked that in the code sample you gave above, if you change torch.jit.trace(torch_model, y) with torch_model(y) it works fine?

@albanD : after using (y,) the error remained the same. However, torch_model(y) instead of torch.jit.trace(torch_model, y) works fine.

Could you do a small code sample (30/40 lines) that reproduces the issue please? So that I can test locally?