Performance issues with pytorch conda package

constantinpape · February 17, 2021, 9:19pm

I have benchmarked the performance of pytorch for a network using 3d convolutions with different gpus on our cluster for two different pytorch builds:

the conda package from the pytorch channel (py3.8_cuda11.0.221_cudnn8.0.5_0)
pytorch build on our cluster with easybuild (I will provide a link to the recipe and build options later)

The performance of the easybuild version is, for some configurations, significantly better than the performance of the conda package; especially for Volta and Ampere cards with half precision.
For details on the benchmarks, see GitHub - constantinpape/3d-unet-benchmarks.

Is it possible that the conda package is built without the correct options to fully leverage the features of the newer architectures like tensorcores?

ptrblck · February 18, 2021, 5:49am

Which CUDA and cudnn version are you using in your source build?
Besides different versions you might also be hitting this issue, which we are working on.

constantinpape · February 18, 2021, 9:49am

The source build uses CUDA 11.1.1 and CUDNN 8.0.4
The issue you have linked looks relevant, I will have a closer look.