I also had this issue, which seemed to occur on both CPU and GPU.
However, the issue went away on CPU when running CUDA_VISIBLE_DEVICES="" first. This allowed a CPU inference using torch.compile to run.
I assume that something is up with my CUDA config. My PyTorch version is 2.0.1+cu117, but my CUDA version is 12.1. However in principle this should not be an issue. Will try a fresh Docker image.