CUDA error: an illegal memory access was encountered. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions

Cross-post from here. I would recommend sticking to one thread instead of creating multiple ones, as different users could re-debug the same issues.