First, I found out that the A100 supports:
- cuda 11.0
- nvidia driver 450
So, when using the A100, I adjusted the environment as follows.
- cuda 11.0
- cudnn 8005 (I checked by
torch.backends.cudnn.version()
) - nvidia driver 450 (more detailed, 450.119.04)
- pytorch 1.8.0
I installed pytorch using,
pip install torch==1.8.0+cu111 torchvision==0.9.0+cu111 torchaudio==0.8.0 -f https://download.pytorch.org/whl/torch_stable.html
However, the time for training my model takes too long.
For example, the model used in the experiment is a complex model with a size of 7.2M.
Using the 1080ti took 1600 seconds for 1 epoch (18,480 data), but it takes 2000 seconds with the A100.
I doubt this is a cudnn version problem, and I want to change cudnn to 8.0.2 or 8.0.3.
How can I change only the cudnn version in pytorch?
Or is there any other solution to this problem?
I’ve already tried torch.backends.cudnn.benchmark = True
to solve this problem, but it was twice as slow.