GPU-efficient pytorch code

gabinkbl · June 19, 2020, 1:33pm

I am using pytorch to implement an algorithm which is not related to neural networks or ML. Speed is paramount, so I intend to run it on a GPU.
Are there are any general guidelines for writing time-efficient code with pytroch that will best utilize the GPU’s processing power and parallelism?

albanD · June 19, 2020, 2:51pm

Hi,

If you’re working only in python. The main idea is to perform as few ops as possible (making them as big as possible). The GPU is amazing to speeding up a given op but is very bad at switching between op and very very bad to execute small ops.
So make sure everything is stored in Tensors as large as possible and always work with the full Tensor.

Do you have particular code in mind?

Atul_Kumar · June 19, 2020, 6:03pm

What I learned with some experience of mine are (including above advice)

Try to map each operation to matrix multiplication as much as possible
avoid gather and select
don’t let GPU wait for CPU batch preparation, make sure that batch data is processed beforehand or done in background process. Keep GPU busy all the time. “nvidia-smi” is your friend to check GPU usage.