Slow A100, cudnn problem?

Hi, I’ve just switched to a cluster with an A100 GPU, but I’m seeing worse performances than what I had on the previous card I was using (i.e. V100). By looking at other discussions, I believe it could be a cudnn version related issue.
I’m working with an installation of pytorch in a conda environment, with the following specifications:

torch.__version__ = '1.13.1'
torch.cuda.get_device_name = 'NVIDIA A100 80GB PCIe'
torch.version.cuda = '11.6'
torch.backends.cudnn.version() = 8302

I’ve read online that the CUDA version should be 11.x, so there should not be any problem, since the one installed is 11.6.
Are there any recommended cudnn versions (or torch versions) for working with an A100? Can the problem be solved via a conda install?

In case you are using the default float32 for your model training you might consider enabling TF32 for cuBLAS operations via: torch.backends.cuda.matmul.allow_tf32 = True and see if this would give you the desired speedup.
Also, I would recommend updating to the latest PyTorch release with the latest CUDA runtime.