Unable to trace MaskRCNN model to torchscript

aurelien-m · March 25, 2022, 10:07am

Hello,

I’m running: Torch 1.11.0+cu113 and TorchVision 0.12.0+cu113.

I have found some questions on this specific error but I’m still not able to fix my issue. I trained a torchvision MaskRCNN model and I would like to convert it into a torch script object to be able to load it using the LibTorch C++ library.

I have the following code:

def load_model(file_path, device):
    model_instance = ...
    return model_instance.model.to(device) # MaskRCNN object

device = torch.device("cuda")
model_path = f'../../models/my_model.ckpt'
model = load_model(model_path, device)

example = torch.rand(8, 3, 1024, 1024).to(device).half()
traced_script_module = torch.jit.trace(
    model.eval().half(), (example))
traced_script_module.save(f"../../models/{model_name}.pt")

But I get the error bellow:

/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/functional.py:3877: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  (torch.floor((input.size(i + 2).float() * torch.tensor(scale_factors[i], dtype=torch.float32)).float()))
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:124: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  torch.tensor(image_size[0] // g[0], dtype=torch.int64, device=device),
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:124: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(image_size[0] // g[0], dtype=torch.int64, device=device),
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:125: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  torch.tensor(image_size[1] // g[1], dtype=torch.int64, device=device),
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:125: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(image_size[1] // g[1], dtype=torch.int64, device=device),
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py:73: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  A = Ax4 // 4
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py:74: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  C = AxC // A
Traceback (most recent call last):
  File "model_to_torchscript.py", line 44, in <module>
    traced_script_module = torch.jit.trace(
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_trace.py", line 741, in trace
    return trace_module(
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_trace.py", line 958, in trace_module
    module._c._create_method_from_trace(
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/generalized_rcnn.py", line 98, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 353, in forward
    boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 240, in filter_proposals
    top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 215, in _get_top_n_idx
    r.append(top_n_idx + offset)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I also tried:

traced_script_module = torch.jit.script(model)

But I get:

Traceback (most recent call last):
  File "model_to_torchscript.py", line 43, in <module>
    traced_script_module = torch.jit.script(model)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_script.py", line 1265, in script
    return torch.jit._recursive.create_script_module(
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 451, in create_script_module
    concrete_type = get_module_concrete_type(nn_module, share_types)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 402, in get_module_concrete_type
    concrete_type = concrete_type_store.get_or_create_concrete_type(nn_module)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 343, in get_or_create_concrete_type
    concrete_type_builder = infer_concrete_type_builder(nn_module)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 208, in infer_concrete_type_builder
    sub_concrete_type = get_module_concrete_type(item, share_types)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 402, in get_module_concrete_type
    concrete_type = concrete_type_store.get_or_create_concrete_type(nn_module)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 343, in get_or_create_concrete_type
    concrete_type_builder = infer_concrete_type_builder(nn_module)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 208, in infer_concrete_type_builder
    sub_concrete_type = get_module_concrete_type(item, share_types)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 402, in get_module_concrete_type
    concrete_type = concrete_type_store.get_or_create_concrete_type(nn_module)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 343, in get_or_create_concrete_type
    concrete_type_builder = infer_concrete_type_builder(nn_module)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 208, in infer_concrete_type_builder
    sub_concrete_type = get_module_concrete_type(item, share_types)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 402, in get_module_concrete_type
    concrete_type = concrete_type_store.get_or_create_concrete_type(nn_module)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 343, in get_or_create_concrete_type
    concrete_type_builder = infer_concrete_type_builder(nn_module)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 245, in infer_concrete_type_builder
    concrete_type_builder.add_constant(name, _get_valid_constant(name, value, type(nn_module).__name__))
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_recursive.py", line 107, in _get_valid_constant
    raise TypeError(textwrap.dedent("""
TypeError: 
'numpy.int64' object in attribute 'Linear.out_features' is not a valid constant.
Valid constants are:
1. a nn.ModuleList
2. a value of type {bool, float, int, str, NoneType, torch.device, torch.layout, torch.dtype}
3. a list or tuple of (2)

And finally, I tried using a wrapper class:

class TraceWrapper(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, input):
        out = self.model(input)
        return out[0]["boxes"], out[0]["scores"], out[0]["labels"], out[0]["masks"]

...

traced_script_module = torch.jit.trace(
    TraceWrapper(model.eval().half()), example)

But then again, I get:

/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/functional.py:3877: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  (torch.floor((input.size(i + 2).float() * torch.tensor(scale_factors[i], dtype=torch.float32)).float()))
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:124: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  torch.tensor(image_size[0] // g[0], dtype=torch.int64, device=device),
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:124: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(image_size[0] // g[0], dtype=torch.int64, device=device),
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:125: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  torch.tensor(image_size[1] // g[1], dtype=torch.int64, device=device),
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/anchor_utils.py:125: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(image_size[1] // g[1], dtype=torch.int64, device=device),
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py:73: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  A = Ax4 // 4
/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py:74: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  C = AxC // A
Traceback (most recent call last):
  File "model_to_torchscript.py", line 44, in <module>
    traced_script_module = torch.jit.trace(
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_trace.py", line 741, in trace
    return trace_module(
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_trace.py", line 958, in trace_module
    module._c._create_method_from_trace(
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "model_to_torchscript.py", line 19, in forward
    out = self.model(input)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/generalized_rcnn.py", line 98, in forward
    proposals, proposal_losses = self.rpn(images, features, targets)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1098, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 353, in forward
    boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 240, in filter_proposals
    top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
  File "/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torchvision/models/detection/rpn.py", line 215, in _get_top_n_idx
    r.append(top_n_idx + offset)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

I can’t seem to find a way to do it. Is it possible? Any help is much appreciated, thanks!

ptrblck · March 26, 2022, 7:20am

Try to torch.jit.script the model instead as seen here:

model = torchvision.models.detection.maskrcnn_resnet50_fpn()
model.cuda()
model.eval()

x = torch.randn(1, 3, 224, 224, device='cuda')
out = model(x)

model = torch.jit.script(model, x)

aurelien-m · March 26, 2022, 3:14pm

Thanks for your help! Your code works, even though I get this warning:

/home/aurelien/Documents/Projects/autotrain-env/lib/python3.8/site-packages/torch/jit/_script.py:1222: UserWarning: `optimize` is deprecated and has no effect. Use `with torch.jit.optimized_execution() instead
  warnings.warn(

However, I found my problem, when using maskrcnn_resnet50_fpn, I was passing a few parameters. I didn’t realise it, but I was passing a numpy.int64 instead of a Python int. This lead me to have this error:

TypeError: 
'numpy.int64' object in attribute 'Linear.out_features' is not a valid constant.
Valid constants are:
1. a nn.ModuleList
2. a value of type {bool, float, int, str, NoneType, torch.device, torch.layout, torch.dtype}
3. a list or tuple of (2)

By simpling doing the following, it made jit.script work without any issues:

model = torchvision.models.detection.maskrcnn_resnet50_fpn(num_classes=int(num_classes), min_size=int(min_size), max_size=int(max_size))

Hopefully that can also help others with strange errors.