Jit.trace not working for detecto model

draaken0 · September 23, 2020, 9:56am

I am using detecto model which is trained on custom data set, but the problem is when i try to use that model with jit.trace it throws an error.

Code:

from detecto.core import Model
from detecto.utils import read_image
import torch

model = Model.load(model_path, classes)

example = torch.rand(1, 3, 224, 224)
model_trace = torch.jit.trace(model, example)

Error:

Traceback (most recent call last):
File “c:/Users/shehr/Desktop/Android Project/Android_Model.py”, line 21, in
model_trace = torch.jit.script(model, example)
File “C:\Users\shehr\Anaconda3\envs\tf_gpu\lib\site-packages\torch\jit_init_.py”, line 1257, in script
qualified_name = _qualified_name(obj)
File “C:\Users\shehr\Anaconda3\envs\tf_gpu\lib\site-packages\torch_jit_internal.py”, line 682, in _qualified_name
name = obj.name
AttributeError: ‘Model’ object has no attribute ‘name’

Please help me out

tom · September 23, 2020, 5:31pm

That’s because detecto’s Model are “unrelated” wrappers for TorchVision’s models rather than nn.Modules. You might have some luck by wrapping them in a function and tracing that (torch.jit.trace(lambda x: model(x), example)).

Best regards

Thomas

draaken0 · September 24, 2020, 4:45am

Thank you Thomas for the reply.
I tried to wrap it in a function as you suggested still it didn’t work. Only option left for me to re-train a model based on nn.Module.

tom · September 24, 2020, 10:10am

Well, what error are you getting?

The other bit could be to unwrap detecto’s model (which, as far as I can see is just one of TorchVision’s pretrained detection models by default).

draaken0 · September 24, 2020, 10:30am

this time i am getting " ‘Model’ object is not callable". And yes it is Fasterrcnn pre-trained model which is of torchvision. I looked into the source code of Detecto and what they are doing is loading the pre-trained model, removing last layer of the model and adding a custom layer of output node of length equal to no. of classes we pass in it and all these they are doing it in the Model class they created.

Code

class Model:

def init(self, classes=None, device=None):

    self._device = device if device else config['default_device']

    # Load a model pre-trained on COCO
    self._model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

    if classes:
        # Get the number of input features for the classifier
        in_features = self._model.roi_heads.box_predictor.cls_score.in_features
        # Replace the pre-trained head with a new one (note: +1 because of the __background__ class)
        self._model.roi_heads.box_predictor = FastRCNNPredictor(in_features, len(classes) + 1)
        self._disable_normalize = False
    else:
        classes = config['default_classes']
        self._disable_normalize = True

    self._model.to(self._device)

    # Mappings to convert from string labels to ints and vice versa
    self._classes = ['__background__'] + classes
    self._int_mapping = {label: index for index, label in enumerate(self._classes)}

tom · September 24, 2020, 10:34am

Yeah, sorry, you’d need model.predict(…) in the lambda rather than model(…)

draaken0 · September 24, 2020, 11:29am

Still not working. I dont have much exp with pytorch as m new to it and now m stuck with this. I just wanted to use this model on Android app but now it seems i have to drop this. I followed what they told on pytorch website, load the trained model, trace it to make it executable for android and run it.

import detecto.core
import torch

model = detecto.core.Model.load(model_path, classlist)
example = torch.rand(3, 224, 224)
model_trace = torch.jit.trace(model, example)
model_trace.save(save_path + ‘traced_model’)

I just dont want to retrain thr model, it takes a lot of time but i will have to.

tom · September 24, 2020, 11:49am

The problem is that a detecto Model isn’t a PyTorch model, so there is a mismatch there.
But you could just grab the model._model if you wanted to. That is a PyTorch model.

draaken0 · September 24, 2020, 12:08pm

I could do that but I have already trained a model on a custom dataset using Detecto, so I wanted to use that model. Anyway, thanks Thomas for clearing this out. Next time, I will just use an existing pytorch model to avoid this kind of problem.

draaken0 · October 6, 2020, 9:06am

I got the work around for issue I was facing earlier.

model = Model.load(model_path, classes)
int_model = model.get_internal_model(model)
int_model.eval()
example = torch.rand(1, 3, 224, 224)
model_trace = torch.jit.trace(int_model, example)

but now m getting another error in the line

model_trace = torch.jit.trace(int_model, example)

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same

It seems my model is on GPU by the input is on CPU.
I have tried changing “example” to example.to(‘cuda’)
But then it says

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Any help regarding this ?

Shisho_Sama · October 6, 2020, 9:46am

doing a example = example.cuda() should suffice.
or if you want to use .to make sure you store it back.

draaken0 · October 6, 2020, 9:51am

I tried that too but still the error is same.

example = torch.rand(1, 3, 224, 224)
example = example.cuda()
model_trace = torch.jit.trace(modelx, example)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Shisho_Sama · October 6, 2020, 11:19am

what do you get when you run this ? :

print(f'modelx.device : {next(modelx.parameters()).device}' )
print(f'example.device : {example.device}')

draaken0 · October 6, 2020, 11:34am

I get this

modelx.device : cuda:0
example.device : cuda:0

Still it throws that error. M so confused right now

Shisho_Sama · October 6, 2020, 11:35am

Can you show the full stacktrace?
(side note: by the way , you always need to do a .eval() on your model before tracing it).

draaken0 · October 6, 2020, 11:37am

Code


example = torch.rand(1, 3, 224, 224)
example = example.to('cuda')

print(f'modelx.device : {next(modelx.parameters()).device}' )
print(f'example.device : {example.device}')

model_trace = torch.jit.trace(modelx, example)

Output Trace

modelx.device : cuda:0
example.device : cuda:0

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-31-d2e0f4962a80> in <module>
      4 print(f'modelx.device : {next(modelx.parameters()).device}' )
      5 print(f'example.device : {example.device}')
----> 6 model_trace = torch.jit.trace(modelx, example)

~\Anaconda3\envs\tf_gpu\lib\site-packages\torch\jit\__init__.py in trace(func, example_inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit)
    953         return trace_module(func, {'forward': example_inputs}, None,
    954                             check_trace, wrap_check_inputs(check_inputs),
--> 955                             check_tolerance, strict, _force_outplace, _module_class)
    956 
    957     if (hasattr(func, '__self__') and isinstance(func.__self__, torch.nn.Module) and

~\Anaconda3\envs\tf_gpu\lib\site-packages\torch\jit\__init__.py in trace_module(mod, inputs, optimize, check_trace, check_inputs, check_tolerance, strict, _force_outplace, _module_class, _compilation_unit)
   1107             func = mod if method_name == "forward" else getattr(mod, method_name)
   1108             example_inputs = make_tuple(example_inputs)
-> 1109             module._c._create_method_from_trace(method_name, func, example_inputs, var_lookup_fn, strict, _force_outplace)
   1110             check_trace_method = module._c._get_method(method_name)
   1111 

~\Anaconda3\envs\tf_gpu\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    718                 input = result
    719         if torch._C._get_tracing_state():
--> 720             result = self._slow_forward(*input, **kwargs)
    721         else:
    722             result = self.forward(*input, **kwargs)

~\Anaconda3\envs\tf_gpu\lib\site-packages\torch\nn\modules\module.py in _slow_forward(self, *input, **kwargs)
    702                 recording_scopes = False
    703         try:
--> 704             result = self.forward(*input, **kwargs)
    705         finally:
    706             if recording_scopes:

~\Anaconda3\envs\tf_gpu\lib\site-packages\torchvision\models\detection\generalized_rcnn.py in forward(self, images, targets)
     96         if isinstance(features, torch.Tensor):
     97             features = OrderedDict([('0', features)])
---> 98         proposals, proposal_losses = self.rpn(images, features, targets)
     99         detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
    100         detections = self.transform.postprocess(detections, images.image_sizes, original_image_sizes)

~\Anaconda3\envs\tf_gpu\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    718                 input = result
    719         if torch._C._get_tracing_state():
--> 720             result = self._slow_forward(*input, **kwargs)
    721         else:
    722             result = self.forward(*input, **kwargs)

~\Anaconda3\envs\tf_gpu\lib\site-packages\torch\nn\modules\module.py in _slow_forward(self, *input, **kwargs)
    702                 recording_scopes = False
    703         try:
--> 704             result = self.forward(*input, **kwargs)
    705         finally:
    706             if recording_scopes:

~\Anaconda3\envs\tf_gpu\lib\site-packages\torchvision\models\detection\rpn.py in forward(self, images, features, targets)
    491         proposals = self.box_coder.decode(pred_bbox_deltas.detach(), anchors)
    492         proposals = proposals.view(num_images, -1, 4)
--> 493         boxes, scores = self.filter_proposals(proposals, objectness, images.image_sizes, num_anchors_per_level)
    494 
    495         losses = {}

~\Anaconda3\envs\tf_gpu\lib\site-packages\torchvision\models\detection\rpn.py in filter_proposals(self, proposals, objectness, image_shapes, num_anchors_per_level)
    392 
    393         # select top_n boxes independently per level before applying nms
--> 394         top_n_idx = self._get_top_n_idx(objectness, num_anchors_per_level)
    395 
    396         image_range = torch.arange(num_images, device=device)

~\Anaconda3\envs\tf_gpu\lib\site-packages\torchvision\models\detection\rpn.py in _get_top_n_idx(self, objectness, num_anchors_per_level)
    372                 pre_nms_top_n = min(self.pre_nms_top_n(), num_anchors)
    373             _, top_n_idx = ob.topk(pre_nms_top_n, dim=1)
--> 374             r.append(top_n_idx + offset)
    375             offset += num_anchors
    376         return torch.cat(r, dim=1)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Yes, I did perform the model.eval() just after loading the model

Shisho_Sama · October 6, 2020, 11:47am

How is your modelx defined?

draaken0 · October 6, 2020, 11:52am

def get_model(classes):
    num_classes = len(classes)
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes+1)
    model.to('cuda')
    _classes = ['__background__'] + classes
    int_mapping = {label: index for index, label in enumerate(_classes)}
    return model, int_mapping

modelx,_ = get_model(classes)
modelx.load_state_dict(torch.load(model_path))
modelx.eval()

draaken0 · October 6, 2020, 11:58am

I have trained the model on a custom dataset and deployed it and its works fine. There is no issue with the production. Then, I wanted to deploy it on a Android device and for that I needed a traced file. Thats where all this started.

draaken0 · October 6, 2020, 1:25pm

@Shisho_Sama I just got to know that MaskRCNN of Faster RCNN models doesn’t have support for torch.jit.trace() as they only support torch.jit.script()