Device Side Assert on the backward() call

I have the following error:

Traceback (most recent call last):
  File "/home/s3092593/as-graph/slurm/../train.py", line 204, in <module>
    train(config)
  File "/home/s3092593/as-graph/slurm/../train.py", line 100, in train
    train_loss, train_runtime, train_accuracy, train_dist = train(model, train_loader, optimizer, criterion, runtime_evaluator, accuracy_evaluator,
  File "/home/s3092593/as-graph/deeper_gnn/train.py", line 76, in train
    scaler.scale(loss).backward()
  File "/data1/s3092593/thesis/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "/data1/s3092593/thesis/lib/python3.10/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: device-side assert triggered

I ran the code with CUDA_LAUNCH_BLOCKING=1. The error happens after many batches (but the exact number changes from run to run)
When it happens, it also prints many asserts:

...
/opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [57,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [58,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [59,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [60,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [61,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [62,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1670525541990/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [0,0,0], thread: [63,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
...

However, I cannot find the source of the error.

The error happens only on the GPU and not on the CPU

A scatter or gather operation is failing with an invalid index. Check the stacktrace to see which operation fails and then make sure the index tensor contains valid indices in expected range.