Could you run your code with:
CUDA_LAUNCH_BLOCKING=1 python script.py args
and post the stack trace here, please?
Also, does your code run on the CPU without any errors?
Could you run your code with:
CUDA_LAUNCH_BLOCKING=1 python script.py args
and post the stack trace here, please?
Also, does your code run on the CPU without any errors?