RuntimeError: CUDA error: device-side assert triggered at a certain iteration

Could you rerun the code via:

CUDA_LAUNCH_BLOCKING=1 python scripy.py args

and post the stack trace here?
Alternatively, you could also run the script on the CPU, which should give you a better error message.