I’m facing a CUDNN_STATUS_INTERNAL_ERROR on machines that have an RTX Quadro 8000 GPU.
This happens only when I set cudnn.benchmark equal to True.
I do not face the same error on other machines that have the same python/pytorch/cuda/cudnn versions installed.
I was able to reproduce the error using a simple code that is given below:
import time
import torch
from torch import nn, optim
import torch.utils.data as data_utils
import torchvision.models as models
from torch.backends import cudnn
from torch.nn import functional as F
cudnn.enabled = True
cudnn.benchmark = True
device = torch.device("cuda")
torch.cuda.set_device(2)
batch_size=1
img_size=1024
N=320
lr=0.001
channels=3
train_data = torch.randn(N, channels, img_size, img_size)
train_labels = torch.ones(N).long()
train = data_utils.TensorDataset(train_data, train_labels)
train_loader = data_utils.DataLoader(train, batch_size=batch_size, shuffle=True, pin_memory=True)
criterion = nn.CrossEntropyLoss().cuda()
model = models.densenet161().cuda()
#model = models.resnet18().cuda()
model.train()
optimizer = optim.Adam(model.parameters(), lr=lr)
for x,y in train_loader:
x, y = x.to('cuda',non_blocking=True), y.to('cuda',non_blocking=True)
pred = model(x)
print('forward done')
loss = criterion(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Some observations while running the above code on the machines that give the error:
- When I decrease img_size to 512 I do not face the same error
- When I change model to resnet I do not face the same error
- Changing batch size does not seem to have any effect
I would appreciate any help or information to help debug the source of the problem.
Library versions:
Python 3.7.5
Pytorch1.3.1
Cuda 10.1
Cudnn 7.6.3
Ubuntu 18.04
Also I understand from other posts that cudnn.benchmark looks for the best implementation for the particular hardware and image size and then uses that throughout the training. However, I could not find a good explanation of what cudnn.enabled does. I see that even if I set cudnn.benchmark as False I do get a boost in performance by just setting cudnn.enabled as True. How does it help?