Model runs faster on CPU than on GPU

I’m loading a yolov5 model on GPU in the following way:

model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

I have measured it’s performance:

start = timer()
results = model(test_image.jpg')
end = timer()
print(end - start)

It gives me around 1 second which is very slow. However, when I run it on CPU it gives me about 0.2 seconds:

model = torch.hub.load('ultralytics/yolov5', 'yolov5s', device='cpu')

Why it is so slow on GPU and how can I boost it’s perfomance?

CUDA operations are executed asynchronously, so you would need to synchronize the code via torch.cuda.synchronize() before starting and stopping the timer. Besides that, you should add warmup iterations and profile a few steps to get valid results.