Facing CUDNN_STATUS_INTERNAL_ERROR during inference

I am running inference on a pytorch model using 2 GPUs.
The error I get is:

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

Complete stack:

Loaded model config from [None]
0/2032
Traceback (most recent call last):
  File "/home/paperspace/StableVITON-new/inference.py", line 110, in <module>
    main(args)
  File "/home/paperspace/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/paperspace/StableVITON-new/inference.py", line 68, in main
    z, c = model.get_input(batch, params.first_stage_key)
  File "/home/paperspace/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/paperspace/StableVITON-new/cldm/cldm.py", line 73, in get_input
    x, c = super().get_input(batch, self.first_stage_key, *args, **kwargs)
  File "/home/paperspace/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/paperspace/StableVITON-new/ldm/models/diffusion/ddpm.py", line 813, in get_input
    encoder_posterior = self.encode_first_stage(x)
  File "/home/paperspace/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/paperspace/StableVITON-new/ldm/models/diffusion/ddpm.py", line 897, in encode_first_stage
    return self.first_stage_model.encode(x)
  File "/home/paperspace/StableVITON-new/ldm/models/autoencoder.py", line 85, in encode
    h = self.encoder(x)
  File "/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/paperspace/StableVITON-new/ldm/modules/diffusionmodules/model.py", line 523, in forward
    hs = [self.conv_in(x)]
  File "/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

The code I am running is:
CUDA_VISIBLE_DEVICES=2 python inference.py
–config_path ./configs/VITONHD.yaml
–batch_size 1
–model_load_path ./ckpts/viton.ckpt
–save_dir ./logs

The script can be found at:

Device:

Other details:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

Please let me know what could be wrong with this.
I could not understand what was causing this CUDNN error to pop.
For completeness, I have rented the GPUs from paperspace.com to test a virtual try-on diffusion model.

@ptrblck
Hi, requesting you to please help. Thank you in advance!

There is a way to get a more detailed error message than just the error enum by setting some environment variables: Troubleshooting — NVIDIA cuDNN 9.0.0 documentation I suggest you try that and then upload the log file you get. You can prepend this to your command line: CUDNN_LOGLEVEL_DBG=3 CUDNN_LOGDEST_DBG=$HOME/my_output_log.txt

Could you rerun your workload with the latest stable or nightly PyTorch release, if not already done, and check if you are still running into the same issue?