I am running inference on a pytorch model using 2 GPUs.
The error I get is:
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
Complete stack:
Loaded model config from [None]
0/2032
Traceback (most recent call last):
File "/home/paperspace/StableVITON-new/inference.py", line 110, in <module>
main(args)
File "/home/paperspace/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/paperspace/StableVITON-new/inference.py", line 68, in main
z, c = model.get_input(batch, params.first_stage_key)
File "/home/paperspace/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/paperspace/StableVITON-new/cldm/cldm.py", line 73, in get_input
x, c = super().get_input(batch, self.first_stage_key, *args, **kwargs)
File "/home/paperspace/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/paperspace/StableVITON-new/ldm/models/diffusion/ddpm.py", line 813, in get_input
encoder_posterior = self.encode_first_stage(x)
File "/home/paperspace/.local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/paperspace/StableVITON-new/ldm/models/diffusion/ddpm.py", line 897, in encode_first_stage
return self.first_stage_model.encode(x)
File "/home/paperspace/StableVITON-new/ldm/models/autoencoder.py", line 85, in encode
h = self.encoder(x)
File "/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/paperspace/StableVITON-new/ldm/modules/diffusionmodules/model.py", line 523, in forward
hs = [self.conv_in(x)]
File "/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/home/paperspace/.local/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
The code I am running is:
CUDA_VISIBLE_DEVICES=2 python inference.py
–config_path ./configs/VITONHD.yaml
–batch_size 1
–model_load_path ./ckpts/viton.ckpt
–save_dir ./logs
The script can be found at:
Device:
Other details:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
Please let me know what could be wrong with this.
I could not understand what was causing this CUDNN error to pop.
For completeness, I have rented the GPUs from paperspace.com to test a virtual try-on diffusion model.