How to get fast inference with Pytorch and MXNet model using GPU?

ptrblck · June 5, 2020, 12:18am

Thanks for the information.

In that case you are adding synchronizations, as numpy uses CPU arrays. Your workflow would therefore probably be:

load data on CPU -> transfer to GPU and use MXNet model -> transfer back to CPU -> transform to PyTorch tensor and transfer back to GPU -> use forward pass of PyTorch model -> prediction

which most likely won’t benefit a lot from the GPU. You could try to use both models on the CPU only and compare the processing time. Since you are synchronizing and also transferring the data between the host and device multiple times, the GPU utilization could be low.

Especially for small inputs, the workload on the GPU is small and you might see the overhead of the kernel launch times as well as the data transfer.
Note that CUDA operations in PyTorch are asynchronous, so you would need to synchronize the code manually to profile the desired operation before starting and stopping the timer via torch.cuda.synchronize().