Pytorch and torchvision on arm64 nvidia hardware

Hello,
I have trained a “FastRCNN model with resnet50 backbone” using pytorch framework to detect cars and trucks. This training was done on a x86 architecture computer. This trained model is able to detect cars and trucks and also place bounding boxes, on the same computer.

Later this trained model was then copied to an arm hardware device with Linux that runs python with arm64 architecture.
When this trained model is used to detect cars and trucks, this shows up the following error.

Traceback (most recent call last):
File “/home/nvidia/Desktop/test_env/python_programs/pytorch_cnn/1.obj_det/detection_car_truck/main_det.py”, line 82, in
results = score_frame(frame, model,device=device)
File “/home/nvidia/Desktop/test_env/python_programs/pytorch_cnn/1.obj_det/detection_car_truck/obj_det_functions.py”, line 118, in score_frame
results = model([frame])
File “/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py”, line 100, in forward
detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
File “/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py”, line 752, in forward
box_features = self.box_roi_pool(features, proposals, image_shapes)
File “/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 727, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/ops/poolers.py”, line 213, in forward
self.setup_scales(x_filtered, image_shapes)
File “/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/ops/poolers.py”, line 174, in setup_scales
scales = [self.infer_scale(feat, original_input_shape) for feat in features]
File “/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/ops/poolers.py”, line 174, in
scales = [self.infer_scale(feat, original_input_shape) for feat in features]
File “/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/ops/poolers.py”, line 158, in infer_scale
assert possible_scales[0] == possible_scales[1]
AssertionError

Can someone help me with this issue ?

Is a newly initialized model also failing on the Jetson device or is the failure only observed after loading the pretrained model? In the latter case, how did you store the model and how are you loading it? Are you mixing different PyTorch or torchvsion versions?

Thanks for the reply.

Workstation
Model is saved in my workstation-computer as

 torch.save(model_fast_r_cnn.state_dict(), "./data/myModels/car_truck_proai.pth") # Saving the model

The torchvision and torchvision versions on workstation are

torch                   1.7.0
torchvision             0.8.1

This model is then copied to the Nvidia-hardware.

Nvidia hardware
The below shown step loads the model successfully on the Nvidia-ProAI hardware

model.load_state_dict(torch.load(data['Model']['path']))  # model loaded using yaml file
model.eval()  # put the model in evaluation mode

The problem occurs when the “video frame or Image” is passed to the model for object detection

    results = model([frame]) # Passing frame inside model

The torch and torchvision on the Nvidia-hardware is

torch              1.7.0a0
torchvision        0.8.0a0+2f40a48

Error

Traceback (most recent call last):
  File "/home/nvidia/Desktop/test_env/python_programs/pytorch_cnn/1.obj_det/detection_car_truck/main_det.py", line 82, in <module>
    results = score_frame(frame, model,device=device)  
  File "/home/nvidia/Desktop/test_env/python_programs/pytorch_cnn/1.obj_det/detection_car_truck/obj_det_functions.py", line 118, in score_frame
    results = model([frame])
  File "/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/models/detection/generalized_rcnn.py", line 100, in forward
    detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
  File "/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/models/detection/roi_heads.py", line 752, in forward
    box_features = self.box_roi_pool(features, proposals, image_shapes)
  File "/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 213, in forward
    self.setup_scales(x_filtered, image_shapes)
  File "/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 174, in setup_scales
    scales = [self.infer_scale(feat, original_input_shape) for feat in features]
  File "/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 174, in <listcomp>
    scales = [self.infer_scale(feat, original_input_shape) for feat in features]
  File "/home/nvidia/Softwares/miniconda3/envs/torch_cp37/lib/python3.7/site-packages/torchvision/ops/poolers.py", line 158, in infer_scale
    assert possible_scales[0] == possible_scales[1]
AssertionError

Please let me know if further information is needed. Thank you in advance.

The library versions on the “Nvidia-ProAI hardware” are older than the one used to train and save the state_dicts. PyTorch is backward compatible (you can load a checkpoint from an older PyTorch release in a newer version) but not forward compatible. Either update the libs on the Jetson or use an older release to store the checkpoint.
However, it’s still unclear if a randomly initialized model would also fail using the older library versions, so I would test it first.

I trained with older version of Pytorch on workstation

# On Workstation  -- older version
torch              1.6.0
torchvision        0.7.0

Then copied it to Nvidia-hardware.

# On Nvidia-ProAI  -- Newer version
torch              1.7.0a0
torchvision        0.8.0a0+2f40a48

Still the same problem or error persists.

Problem seems to be similar to this post: https://discuss.pytorch.org/t/assertion-error-when-using-resnet-18-as-backbone-with-faster-rcnn/113965.

Were you able to test the randomly initialized model as previously described?

I did some research and could not understand how to randomly initialize an object detection model. Can you please give an example of “randomly initialized model” for object detection.

By the way object detection seems to work with “Retinanet model” on the Nvidia-hardware as shown below, instead of “FastRCNN”.

model = torchvision.models.detection.retinanet_resnet50_fpn(pretrained=True)

Initialize your model in the same way (with pretrained set to True or False) and check if the model is working at all. Your initial question seemed to target the model loading specifically, but I’m unsure if the setup you are using on the “Nvidia-ProAI” device is working at all with this model or not.

The model seems to download and load properly, the below lines of code seems to work properly,

Nvidia-ProAI

def get_model(num_classes):
    """
    downloads the model and returns the model with the specified output classes
    Args:
        num_classes: number of output classes

    Returns:
        Returns the model with the linear layer of classes = num_classes
    """
    # load an object detection model pre-trained on COCO
    model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new on
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    return model


def load_saved_model():
    """ Loads model 
        Check config file the parameters
    """
    data = yaml_loader()
    model = get_model(data['Model']['num_classes'])  # Using Saved model
    model.load_state_dict(torch.load(data['Model']['path']))
    model.eval()  # put the model in evaluation mode
    return model


# Using Saved model
model = load_saved_model()

The problem seems to occur only while passing the “video frame” to the model for object detection as shown below,

def score_frame(frame, model,device):
    """
    The function below identifies the device which 
    is availabe to make the prediction and uses it 
    to load and infer the frame. 
    Once it has results it will extract the labels and 
    cordinates(Along with scores) for each object detected 
    in the frame.
    
    Args:
        frame: Image frame
        model: object detection model
    Returns:
        labels: label of the image
        cords: coordinates of the bounding boxes
    """
    # Model to device
    model.to(device)
    # Passing frame inside model
    results = model([frame])             # Error 

    return results


# Score the Frame
    results = score_frame(frame, model, device=device)   # Error

I hope my question is clear. Please let me know if further information is required. Thank you in advance.