I am working on a research project and comparing semantic segmentation, object detection algorithms in terms of speed. The semantic segmentation model which I am using works around 10 fps faster than the object detection model. But when I add code to move tensor from GPU to CPU, the semantic model gets slower(probably because the tensor is of big size) and the fps difference is reduced to 2 fps.
prediction = prediction.cpu()
prediction is a 1*2*1920*1080 tensor.
I want to find contours of prediction once it is converted to NumPy array. Is there any way I can speed up the process either by speeding up conversion or performing the contour detection process on GPU(though I couldn’t find any GPU implementation.)
Any resource or ideas which can help in this situation would be of great help.
Best.