How to debug "RuntimeError: CUDA error: invalid configuration argument"?

Hello, I get the following error (using os.environ['CUDA_LAUNCH_BLOCKING'] = "1")

RuntimeError                              Traceback (most recent call last)
notebook.ipynb Zelle 13 in <cell line: 19>()
     22 pred = model(input.to(device))
     23 loss = criterion(pred,label.to(device))
---> 24 loss.backward()
     25 optimizer.step()
     26 optimizer.zero_grad()

File c:\ProgramData\Anaconda3\lib\site-packages\torch\_tensor.py:396, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    387 if has_torch_function_unary(self):
    388     return handle_torch_function(
    389         Tensor.backward,
    390         (self,),
   (...)
    394         create_graph=create_graph,
    395         inputs=inputs)
--> 396 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

File c:\ProgramData\Anaconda3\lib\site-packages\torch\autograd\__init__.py:173, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    168     retain_graph = create_graph
    170 # The reason we repeat same the comment below is that
    171 # some Python versions print out the first line of a multi-line function
    172 # calls in the traceback and some print out the last line
--> 173 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    175     allow_unreachable=True, accumulate_grad=True)

RuntimeError: CUDA error: invalid configuration argument

How can I debug this error that appears when I calculate the backward of sparse tensors on cuda? The error does not appear when using the cpu.

Could you post a minimal, executable code snippet as well as the output of python -m torch.utils.collect_env, please?