Hi, I’ve just switched to a cluster with an A100 GPU, but I’m seeing worse performances than what I had on the previous card I was using (i.e. V100). By looking at other discussions, I believe it could be a cudnn version related issue.
I’m working with an installation of pytorch in a conda environment, with the following specifications:
torch.__version__ = '1.13.1'
torch.cuda.get_device_name = 'NVIDIA A100 80GB PCIe'
torch.version.cuda = '11.6'
torch.backends.cudnn.version() = 8302
I’ve read online that the CUDA version should be 11.x, so there should not be any problem, since the one installed is 11.6.
Are there any recommended cudnn versions (or torch versions) for working with an A100? Can the problem be solved via a conda install?