RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED

rina · November 6, 2020, 5:44am

I have the following model:

model = torch.nn.Sequential(
        torch.nn.Conv3d(1, 128, kernel_size=3, padding=1),
        torch.nn.ReLU(),
        torch.nn.Conv3d(128, 128, kernel_size=3, padding=1),
        torch.nn.ReLU(),
        torch.nn.Conv3d(128, 128, kernel_size=3, padding=1),
        torch.nn.ReLU(),
        torch.nn.Conv3d(128, 128, kernel_size=3, padding=1),
        torch.nn.ReLU(),
        torch.nn.Conv3d(128, 128, kernel_size=3, padding=1),
        torch.nn.ReLU(),
        torch.nn.Conv3d(128, 1, kernel_size=3, padding=1),
        torch.nn.ReLU()
    ).to(device)

For which, I use batches of the size N x 160 x 256 x 256
I was running into the RuntimeError: CUDA out of memory error so I reduced the batch size to 2, which resulted in this error instead:
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_SUPPORTED. This error may appear if you passed in a non-contiguous input.
A quick search in google suggested that my batch size was still too large so I reduced it 1, which surprisingly gave me the RuntimeError: CUDA out of memory error again.

How can I resolve this issue?

Thank you.

ptrblck · November 6, 2020, 6:36am

The first two nn.Conv3d layers with an an input of [1, 1, 160, 256, 256] will already take ~20GB during the forward pass (with intermediate activation needed for the backward pass). The complete model will thus have a much higher memory usage and I assume your GPU doesn’t have enough memory.