The cuda execution error of scatter_()

I ran the model on single GPU with multithreading, and details about this error are as follows.

../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:145: operator(): block: [1103,0,0], thread: [94,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:145: operator(): block: [1103,0,0], thread: [95,0,0] Assertion `idx_dim >= 0 && idx_dim < index_size && "index out of bounds"` failed.
Traceback (most recent call last):
  File "/home/cy/data/envs/nnunetv2/bin/nnUNetv2_train", line 33, in <module>
    sys.exit(load_entry_point('nnunetv2-cy', 'console_scripts', 'nnUNetv2_train')())
  File "/data/cy/projects/nnUNetV2/nnunetv2/run/run_training.py", line 258, in run_training_entry
    run_training(args.dataset_name_or_id, args.configuration, args.fold, args.tr, args.p, args.pretrained_weights,
  File "/data/cy/projects/nnUNetV2/nnunetv2/run/run_training.py", line 201, in run_training
    nnunet_trainer.run_training()
  File "/data/cy/projects/nnUNetV2/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 1285, in run_training
    train_outputs.append(self.train_step(next(self.dataloader_train)))
  File "/data/cy/projects/nnUNetV2/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py", line 886, in train_step
    output = self.network(data)
  File "/home/cy/data/envs/nnunetv2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/cy/projects/nnUNetV2/nnunetv2/network_architecture/e2unet.py", line 62, in forward
    return self.decoder(skips)
  File "/home/cy/data/envs/nnunetv2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/cy/projects/nnUNetV2/nnunetv2/network_architecture/custom_module/refine_decoder.py", line 104, in forward
    x = self.feature_refine[s](x)
  File "/home/cy/data/envs/nnunetv2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data/cy/projects/nnUNetV2/nnunetv2/network_architecture/custom_module/feature_refine/modules/deformable_refine.py", line 119, in forward
    updated_x = updated_x.scatter_(dim=2, index=linear_pos, src=x_sampled)
RuntimeError: CUDA error: device-side assert triggered

I have no ideas about the problem, because I’ve checked the dimension size of all the three tensors (updated_x, linear_pos, x_sampled) and the indices in linear_pos, and nothing wrong seemingly.

The error points towards invalid indices. Since you’ve already checked these, what’s the shape of all tensors and the min./max. index of linear_pos?

Sometimes it can run smoothly for few epochs and then report the above error, but sometimes it will report the error immediately when it starts running.

Thank you very much for your reply!
The shapes of updated_x, linear_pos and x_sampled are (B, C, HW), (B, C, h w) and (B, C, hw) respectively. The maximum index of linear_pos is H W-1, and the minimum index is 0.

Thanks for the shapes and range!
This setup works properly:

B, C, H, W = 2, 3, 4, 4

device = "cuda"
updated_x = torch.randn(B, C, H, W).to(device)
x_sampled = torch.randn(B, C, H, W).to(device)
linear_pos = torch.randint(0, H, (B, C, H, W)).to(device)

updated_x.scatter_(dim=2, index=linear_pos, src=x_sampled)

so I assume you have some randomness in your process and the indices are not always in this range, which would also explain why the code works sometimes.
You would thus still need to check the shapes and index range in the failing iteration, not in the working one.