Getting the following error. how can i solve this error?

2023-11-29 11:01:06.577786: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-29 11:01:06.577851: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-29 11:01:06.577893: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-29 11:01:07.971136: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
11:01:09 - INFO: Starting epoch 0:
100% 210/210 [03:56<00:00, 1.13s/it, MSE=0.239]
11:05:05 - INFO: Sampling 10 new images…
0it [00:00, ?it/s]…/aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [0,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed.
…/aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [0,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed.
…/aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [0,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize failed.
…/aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [0,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize failed…
0it [00:00, ?it/s]
Traceback (most recent call last):
File β€œ/content/drive/MyDrive/Diffusion-Models-pytorch/ddpm_conditional.py”, line 128, in
launch()
File β€œ/content/drive/MyDrive/Diffusion-Models-pytorch/ddpm_conditional.py”, line 124, in launch
train(args)
File β€œ/content/drive/MyDrive/Diffusion-Models-pytorch/ddpm_conditional.py”, line 102, in train
sampled_images = diffusion.sample(model, n=len(labels), labels=labels)
File β€œ/content/drive/MyDrive/Diffusion-Models-pytorch/ddpm_conditional.py”, line 48, in sample
predicted_noise = model(x, t, labels)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File β€œ/content/drive/MyDrive/Diffusion-Models-pytorch/modules.py”, line 234, in forward
x1 = self.inc(x)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File β€œ/content/drive/MyDrive/Diffusion-Models-pytorch/modules.py”, line 76, in forward
return self.double_conv(x)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/container.py”, line 215, in forward
input = module(input)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py”, line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File β€œ/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py”, line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: GET was unable to find an engine to execute this computation

An indexing operation is failing:

…/aten/src/ATen/native/cuda/Indexing.cu:1239: indexSelectSmallIndex: block: [0,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed.

Rerun the code via CUDA_LAUNCH_BLOCKING=1 python script.py args to narrow down which indexing op fails and fix it.

1 Like

thank you sir for the quick response
root@09622d7731fa:/workspace/Diffusion-Models-pytorch-main# CUDA_LAUNCH_BLOCKING=1 python ddpm_conditional.py args
usage: ddpm_conditional.py [-h]
ddpm_conditional.py: error: unrecognized arguments: args.
but when i run without args it give me this

root@09622d7731fa:/workspace/Diffusion-Models-pytorch-main# CUDA_LAUNCH_BLOCKING=1 python ddpm_conditional.py
02:57:52 - INFO: Starting epoch 0:
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1050/1050 [03:55<00:00, 4.47it/s, MSE=0.0154]
03:01:47 - INFO: Sampling 10 new images…
0it [00:00, ?it/s]/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [0,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [1,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [2,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [3,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [4,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [5,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [6,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [7,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [8,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [9,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [10,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1682343967769/work/aten/src/ATen/native/cuda/Indexing.cu:1093: indexSelectSmallIndex: block: [1,0,0], thread: [11,0,0] Assertion srcIndex < srcSelectDimSize failed. and so on at the end:

File β€œ/workspace/Diffusion-Models-pytorch-main/ddpm_conditional.py”, line 128, in
launch()
File β€œ/workspace/Diffusion-Models-pytorch-main/ddpm_conditional.py”, line 124, in launch
train(args)
File β€œ/workspace/Diffusion-Models-pytorch-main/ddpm_conditional.py”, line 102, in train
sampled_images = diffusion.sample(model, n=len(labels), labels=labels)
File β€œ/workspace/Diffusion-Models-pytorch-main/ddpm_conditional.py”, line 48, in sample
predicted_noise = model(x, t, labels)
File β€œ/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File β€œ/workspace/Diffusion-Models-pytorch-main/modules.py”, line 232, in forward
t += self.label_emb(y)
File β€œ/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1501, in _call_impl
return forward_call(*args, **kwargs)
File β€œ/opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py”, line 162, in forward
return F.embedding(
File β€œ/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py”, line 2210, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: CUDA error: device-side assert triggered
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

This embedding operation is failing so your input tensor contains values out-of-bounds for this embedding lookup table.

1 Like