CUDA error: device-side assert triggered(insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:569)

Do you see the same error, if you run the code on CPU?
This might yield a clearer error message than the current CUDA one.

If it’s working fine on the CPU, could you rerun the code using

CUDA_LAUNCH_BLOCKING=1 python script.py args

and post the stack trace again?

PS: You can post code directly by wrapping it in three backticks ``` :wink: