RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 24 and 195 in dimension 0 at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/TH/generic/THTensor.cpp:689

Fariha1123 · October 14, 2019, 10:05am

I am trying to use Tracing mechanism to trace my python model. Below is my nn.Module:

class GeneralizedRCNN(nn.Module):
    """
    Main class for Generalized R-CNN. Currently supports boxes and masks.
    It consists of three main parts:
    - backbone
    - rpn
    - heads: takes the features + the proposals from the RPN and computes
        detections / masks from it.
    """

    def __init__(self, cfg):
        super(GeneralizedRCNN, self).__init__()

        self.backbone = build_backbone(cfg)
        self.rpn = build_rpn(cfg, self.backbone.out_channels)
        self.roi_heads = build_roi_heads(cfg, self.backbone.out_channels)

    def forward(self, images, targets=None):
        """
        Arguments:
            images (list[Tensor] or ImageList): images to be processed
            targets (list[BoxList]): ground-truth boxes present in the image (optional)

        Returns:
            result (list[BoxList] or dict[Tensor]): the output from the model.
                During training, it returns a dict[Tensor] which contains the losses.
                During testing, it returns list[BoxList] contains additional fields
                like `scores`, `labels` and `mask` (for Mask R-CNN models).

        """
        if self.training and targets is None:
            raise ValueError("In training mode, targets should be passed")
        images = to_image_list(images)
        features = self.backbone(images.tensors)
        proposals, proposal_losses = self.rpn(images, features, targets)
        if self.roi_heads:
            print(type(features))
            print(type(proposals))
            print(type(targets))
            x, result, detector_losses = self.roi_heads(features, proposals, targets)
        else:
            # RPN-only models don't have roi_heads
            x = features
            result = proposals
            detector_losses = {}

        if self.training:
            losses = {}
            losses.update(detector_losses)
            losses.update(proposal_losses)
            return losses

        print("result.....")
        print(result[0].bbox.size())
        print(result[0].bbox)
        return result[0].bbox

When I execute the file, it does print the last two lines as shown below:

result…
torch.Size([24, 4])
tensor([[0.0000e+00, 6.6122e+02, 4.0417e+01, 7.1402e+02],
[0.0000e+00, 7.7747e+02, 9.3248e+01, 7.9861e+02],
[1.9836e+02, 1.5207e+02, 3.1438e+02, 3.3421e+02],
[0.0000e+00, 1.1651e+02, 6.4819e+01, 3.5847e+02],
[1.5965e+02, 7.6236e+02, 3.4225e+02, 7.9900e+02],
[6.8618e+02, 1.1461e-01, 7.9371e+02, 3.6082e+01],
[0.0000e+00, 0.0000e+00, 5.6068e+01, 7.0202e+01],
[6.5861e+02, 7.8178e+01, 7.9319e+02, 3.5219e+02],
[1.3673e+02, 9.9877e+01, 3.1141e+02, 3.3045e+02],
[1.3768e+02, 2.6786e+02, 2.3104e+02, 3.3459e+02],
[6.5975e+02, 5.7610e+02, 6.7409e+02, 5.9613e+02],
[3.5417e+02, 3.3771e+02, 5.1339e+02, 4.6793e+02],
[6.6992e+02, 2.6569e+02, 7.8911e+02, 4.4689e+02],
[7.6212e+02, 7.6213e+02, 7.8742e+02, 7.8259e+02],
[1.3392e+02, 3.3078e+02, 3.3120e+02, 4.6792e+02],
[6.6482e+02, 6.3342e+02, 6.8739e+02, 6.5951e+02],
[3.7355e+02, 1.8197e+02, 6.2639e+02, 4.5210e+02],
[3.3505e+02, 9.4824e+01, 5.5984e+02, 2.3227e+02],
[4.9010e+02, 2.6308e+02, 6.2003e+02, 4.6466e+02],
[3.5421e+02, 2.6553e+02, 5.7457e+02, 4.8271e+02],
[6.8232e+02, 1.7231e+02, 7.9900e+02, 4.2151e+02],
[3.6076e+02, 1.1569e+02, 5.4964e+02, 3.8680e+02],
[5.8361e+02, 5.9198e+02, 6.5042e+02, 6.5802e+02],
[0.0000e+00, 3.4766e+02, 7.7981e+01, 4.8119e+02]])

The execution file named trace.py is as below:

print('loading...')
torch_model = build_detection_model(cfg.clone())
map_location = lambda storage, loc: storage
if torch.cuda.is_available():
    map_location = None
state_dict = torch.load(model_path, map_location=torch.device("cpu"))
# print(state_dict)
load_state_dict(torch_model,state_dict.pop("model"))
torch_model.eval()
print('model loaded and state dict applied successfully')
im = loadImg(sample_input)
image = transforms(im)
image_list = to_image_list(image, cfg.DATALOADER.SIZE_DIVISIBILITY)
image_list = image_list.to('cpu')
print(type(image))
print(image.size())
y = torch.Tensor(1,3,800,800)
y[0] = image
print(y)
print('going in..')
traced_script_module = torch.jit.trace(torch_model, y)
print('done')

Following is the error that I receive when I execute the trace.py:
Traceback (most recent call last):
File “/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/jit/init.py”, line 595, in run_mod_and_filter_tensor_outputs
outs = wrap_retval(mod(*_clone_inputs(inputs)))
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 24 and 195 in dimension 0 at /opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/TH/generic/THTensor.cpp:689
The above operation failed in interpreter, with the following stack trace:
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/poolers.py(96): convert_to_roi_format
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/poolers.py(110): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/mask_head/roi_mask_feature_extractors.py(60): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/mask_head/mask_head.py(70): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/roi_heads/roi_heads.py(39): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/mnt/d/work/MASKRCNN/maskrcnn-benchmark/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py(56): forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(531): _slow_forward
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/nn/modules/module.py(545): call
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/jit/init.py(904): trace_module
/home/fariha/anaconda3/envs/maskrcnn_benchmark/lib/python3.7/site-packages/torch/jit/init.py(772): trace
trace.py(117):

albanD · October 14, 2019, 2:45pm

Hi,

It looks like the error comes from the ROI pooling layer. So Would it be possible that the input you give when tracing is not the right size? You don’t specify the input sizes in your logs.

Fariha1123 · October 16, 2019, 4:49am

@albanD : I am passing “y” to the torch.jit.trace, y has size = torch.Size([1, 3, 800, 800])
and if I pass the “image” variable to jit.trace, then image has size = torch.Size([3, 800, 800])

Also, please keep noted that error is same for both cases, if I pass “y” or “image” to the jit.trace

P.S: if I run the code without tracing, it works fine and the input image has size torch.Size([3, 800, 800])

albanD · October 16, 2019, 3:06pm

Hi,

I think the inputs is expected to be a tuple, so you should pass (y,) no? Not sure if that changes anything though.
Have you checked that in the code sample you gave above, if you change torch.jit.trace(torch_model, y) with torch_model(y) it works fine?

Fariha1123 · October 17, 2019, 6:41am

@albanD : after using (y,) the error remained the same. However, torch_model(y) instead of torch.jit.trace(torch_model, y) works fine.

albanD · October 17, 2019, 2:32pm

Could you do a small code sample (30/40 lines) that reproduces the issue please? So that I can test locally?