Error load trained mask rcnn model with torch.jit

NAEE09 · November 23, 2020, 6:12pm

Hi!
I’m doing the TorchVision Object Detection Finetuning Tutorial in colab: https://pytorch.org/tutorials/intermediate/torchvision_tutorial.html, and I save the trained model using torch.jit, if I load the model in the same notebook after train and save it, I don’t have any problem, but if I try to upload this model in another notebook or in my pc it doesn’t work.

The error in colab is:

RuntimeError                              Traceback (most recent call last)

<ipython-input-18-c0fa55fc421f> in <module>()
----> 1 model_loaded = torch.jit.load("/content/MRCNN_eval.pt")

/usr/local/lib/python3.6/dist-packages/torch/jit/_serialization.py in load(f, map_location, _extra_files)
    159     cu = torch._C.CompilationUnit()
    160     if isinstance(f, str) or isinstance(f, pathlib.Path):
--> 161         cpp_module = torch._C.import_ir_module(cu, f, map_location, _extra_files)
    162     else:
    163         cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

And the error in my pc:
pytorch version: 1.6.0
torchvision version: 0.7.0

Traceback (most recent call last):
  File "mrcnn_inference.py", line 17, in <module>
    model_loaded = torch.jit.load("/home/nae/ML/LoadModelPt/build/Models/MRCNN_model.pt", map_location = 'cpu')
  File "/home/nae/.local/lib/python3.6/site-packages/torch/jit/__init__.py", line 275, in load
    cpp_module = torch._C.import_ir_module(cu, f, map_location, _extra_files)
RuntimeError: 
Arguments for call are not valid.
The following variants are available:
  
  aten::upsample_nearest1d.out(Tensor self, int[1] output_size, float? scales=None, *, Tensor(a!) out) -> (Tensor(a!)):
  Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'.
  
  aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor):
  Expected a value of type 'List[int]' for argument 'output_size' but instead found type 'Optional[List[int]]'.

The original call is:
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 3130

    if input.dim() == 3 and mode == 'nearest':
        return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors)
               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    if input.dim() == 4 and mode == 'nearest':
        return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
Serialized   File "code/__torch__/torch/nn/functional/___torch_mangle_46.py", line 155
    _49 = False
  if _49:
    _51 = torch.upsample_nearest1d(input, output_size3, scale_factors6)
          ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _50 = _51
  else:
'interpolate' is being compiled since it was called from '_resize_image_and_masks'
Serialized   File "code/__torch__/torchvision/models/detection/transform.py", line 227
    self_max_size: float,
    target: Optional[Dict[str, Tensor]]) -> Tuple[Tensor, Optional[Dict[str, Tensor]]]:
  _77 = __torch__.torch.nn.functional.___torch_mangle_46.interpolate
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
  _78 = torch.slice(torch.size(image), -2, 9223372036854775807, 1)
  im_shape = torch.tensor(_78, dtype=None, device=None, requires_grad=False)
'_resize_image_and_masks' is being compiled since it was called from 'GeneralizedRCNNTransform.resize'
Serialized   File "code/__torch__/torchvision/models/detection/transform.py", line 96
    target: Optional[Dict[str, Tensor]]) -> Tuple[Tensor, Optional[Dict[str, Tensor]]]:
    _25 = "This Python function is annotated to be ignored and cannot be run"
    _26 = __torch__.torchvision.models.detection.transform._resize_image_and_masks
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
    _27 = __torch__.torchvision.models.detection.transform.resize_boxes
    _28 = __torch__.torchvision.models.detection.transform.resize_keypoints
'GeneralizedRCNNTransform.resize' is being compiled since it was called from 'GeneralizedRCNNTransform.forward'
  File "/usr/local/lib/python3.6/dist-packages/torchvision/models/detection/transform.py", line 105
                                 "of shape [C, H, W], got {}".format(image.shape))
            image = self.normalize(image)
            image, target_index = self.resize(image, target_index)
                                  ~~~~~~~~~~~ <--- HERE
            images[i] = image
            if targets is not None and target_index is not None:
Serialized   File "code/__torch__/torchvision/models/detection/transform.py", line 45
        pass
      image0 = (self).normalize(image, )
      _8 = (self).resize(image0, target_index, )
                                 ~~~~~~~~~~~~ <--- HERE
      image1, target_index0, = _8
      _9 = torch._set_item(images0, i, image1)

ptrblck · November 25, 2020, 8:59am

The error might be raised due to an incompatibility issue between PyTorch versions assuming the Colab PyTorch version was 1.7.0 while you are using 1.6.0 locally.
Could you update your local installation and try to load the file again?

NAEE09 · November 25, 2020, 9:43am

Thanks for you reply. I changed the colab version to 1.6.0, and it doesn’t work. I already tried to change the versions, and when I upload the model in another notebook, it doesn’t work either.

ptrblck · November 25, 2020, 9:46am

Did you try to update to the latest version instead of downgrading the Colab binary?

NAEE09 · November 25, 2020, 10:17am

Yes, I also tried that.

NAEE09 · December 2, 2020, 9:58am

Finally, it works I’m not sure what was the problem, but I did the training in my local pc, and I can load that model without problem.