How much work is required to make torchvision FPN scriptable?

I’m aware that 2 things are blocking torchvision FPN (feature pyramid network) from being converted into torchscript:

  • The default output indices in IntermediateLayerGetter of resnet_fpn_backbone has mixed types: both integer (1,2,3,4) and string (‘pooling’) are used. As indicated in the following error:
Traceback (most recent call last):
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 75, in copy_to_script_module
    script_module._c._register_attribute(name, the_type, item)
RuntimeError: Unable to cast Python instance of type <class 'int'> to C++ type 'std::string'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/peng/git-fork/__ME__/vision/test/test_backbone_utils.py", line 36, in test_resnet18_fpn_backbone
    script = torch.jit.script(wHead)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/__init__.py", line 1203, in script
    return torch.jit.torch.jit._recursive.recursive_script(obj)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 173, in recursive_script
    return copy_to_script_module(mod, overload_stubs + stubs)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 95, in copy_to_script_module
    torch.jit._create_methods_from_stubs(script_module, stubs)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/__init__.py", line 1423, in _create_methods_from_stubs
    self._c._create_methods(self, defs, rcbs, defaults)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 195, in make_strong_submodule
    new_strong_submodule = recursive_script(module)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 116, in recursive_script
    return create_constant_iterable_module(mod)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 233, in create_constant_iterable_module
    modules[key] = recursive_script(submodule)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 173, in recursive_script
    return copy_to_script_module(mod, overload_stubs + stubs)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 80, in copy_to_script_module
    raise RuntimeError(msg)
RuntimeError: When compiling <class 'torchvision.models._utils.IntermediateLayerGetter'>, could not register attribute return_layers of type Dict[str, str] with value {'layer1': 0, 'layer2': 1, 'layer3': 2, 'layer4': 3}
Original error: Unable to cast Python instance of type <class 'int'> to C++ type 'std::string'
  • The output of FPN is a dict/OrderedDict which is not supported by torchscript, as indicated in the following error:
Traceback (most recent call last):
  File "/home/peng/git-fork/__ME__/vision/test/test_backbone_utils.py", line 36, in test_resnet18_fpn_backbone
    script = torch.jit.script(resnet18_fpn)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/__init__.py", line 1203, in script
    return torch.jit.torch.jit._recursive.recursive_script(obj)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 116, in recursive_script
    return create_constant_iterable_module(mod)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 233, in create_constant_iterable_module
    modules[key] = recursive_script(submodule)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 173, in recursive_script
    return copy_to_script_module(mod, overload_stubs + stubs)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/_recursive.py", line 95, in copy_to_script_module
    torch.jit._create_methods_from_stubs(script_module, stubs)
  File "/home/shared/conda3/envs/cv-torch/lib/python3.7/site-packages/torch/jit/__init__.py", line 1423, in _create_methods_from_stubs
    self._c._create_methods(self, defs, rcbs, defaults)
RuntimeError: 
Arguments for call are not valid.
The following operator variants are available:
  
  aten::keys(Dict(str, t) self) -> (str[](*)):
  Could not match type Tensor to Dict[str, t] in argument 'self': Cannot match a dict to Tensor.
  
  aten::keys(Dict(int, t) self) -> (int[](*)):
  Could not match type Tensor to Dict[int, t] in argument 'self': Cannot match a dict to Tensor.
  
  aten::keys(Dict(float, t) self) -> (float[](*)):
  Could not match type Tensor to Dict[float, t] in argument 'self': Cannot match a dict to Tensor.
  
  aten::keys(Dict(Tensor, t) self) -> (Tensor[](*)):
  Could not match type Tensor to Dict[Tensor, t] in argument 'self': Cannot match a dict to Tensor.

The original call is:
at /home/peng/git-fork/__ME__/vision/torchvision/ops/feature_pyramid_network.py:80:21

        Arguments:
            x (OrderedDict[Tensor]): feature maps for each feature level.

        Returns:
            results (OrderedDict[Tensor]): feature maps after FPN layers.
                They are ordered from highest resolution first.
        """
        # unpack OrderedDict into two lists for easier handling
        names = list(x.keys())
                     ~~~~~~ <--- HERE
        x = list(x.values())

        last_inner = self.inner_blocks[-1](x[-1])
        results = []
        results.append(self.layer_blocks[-1](last_inner))
        for feature, inner_block, layer_block in zip(
            x[:-1][::-1], self.inner_blocks[:-1][::-1], self.layer_blocks[:-1][::-1]
        ):
            if not inner_block:

However after fixing both issues (the first by always using ‘1’, ‘2’, ‘3’ … instead of 1, 2, 3 directly, and the second by adding a wrapper nn.Module that only extracts layer ‘1’). I still encounter the error which similar traceback as the second error. Why is it still the case and what should I do to fix it?

If in the end the cause turns out to be that dict output cannot exist anywhere then it will be the nail on the coffin. As all major FPN implementations (torchvision, detectron2) contains at least 1 dict output at some point.

Otherwise I’ll be happy to submit my code & tests as a pull request.

This PR will hopefully land soon which makes it scriptable: https://github.com/pytorch/vision/pull/1407

Thanks a lot!

Wow, blows my mind off