RT semantic segmentation in Jetson nano

Hello everyone,
I have a semantic segmentation model in pytorch (DeepLabV3 architecture with MobileNetv3Large backbone). I want to deploy that model in jetson nano in real-time. I have 2 questions:

  1. When I do the prediction using jetson nano’s CPU, it takes around 7s for a frame, but for CUDA/GPU it takes like 38s. What am I doing wrong here ?
  2. What other techniques I could apply to minimize my model, like pruning, truncate to make it faster? I have seen several approaches, but personally I was not successful for my model.
    I will be really grateful if somebody can help me.

Thank you very much

Hi Didarul,

For 1) it might be better to ask in Jetson Nano’s forums, they probably know better.
For 2) you can check out our quantization tutorials here for examples on how to speed up your model with quantization.

Thank you very much, Jesse !