.cuda() and .cpu() insanely slow on A4000 with Cuda 11.1

Ubuntu 20.04

I have ‘1.9.0+cu111’

cuda ‘11.1’

CUDNN 8005

I have a server with A4000’s, and they are all operating at 16x PICE express.

Any first call to .cuda() takes 7 mins to complete. And transfer’s between GPU and CPU via the .cpu() or .to(device) take forever. Is there

Could you update to the latest stable release and rerun your code, please?

I did and I get the same issue

Okay it turns out i needed to install the server version of the base nvidia driver and not the regular version. That fixed it

nvidia-driver-510-server instead of nvidia-driver-510