CUDA error : device-side assert triggered

Setting CUDA_LAUNCH_BLOCKING=1, you would get a better traceback.