Hello everyone,
I have a semantic segmentation model in pytorch (DeepLabV3 architecture with MobileNetv3Large backbone). I want to deploy that model in jetson nano in real-time. I have 2 questions:
- When I do the prediction using jetson nano’s CPU, it takes around 7s for a frame, but for CUDA/GPU it takes like 38s. What am I doing wrong here ?
- What other techniques I could apply to minimize my model, like pruning, truncate to make it faster? I have seen several approaches, but personally I was not successful for my model.
I will be really grateful if somebody can help me.
Thank you very much
Didarul