Compile seems targting cuda:0 rather than the device the model is on

The pytorch compile works great when I only have one GPU (gtx 1660 super, no tensor cores) installed on my computer. However, when I installed a new GPU card RTX 3060 (has tensor cores), which becomes the new “cuda:0”, even though I still chose gtx 1660 (“cuda:1” now) to run the model, the pytorch compile still choose rtx 3060, which has tensor cores as the targeted archecture. And I got the following warning: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.

and error:", line 1671, in _init_handles

mod, func, n_regs, n_spills = cuda_utils.load_binary(self.metadata["name"], self.asm["cubin"], self.shared, device)

RuntimeError: Triton Error [CUDA]: device kernel image is invalid

Is there a way I can specify the device to the torch.compile() function?

Hmm we might need to clean up the warning to work better in multi GPU setup and as far as not respecting the device that seems like a bug so please open an issue here Issues · pytorch/pytorch · GitHub

I have created an issue: Compile targts cuda:0 rather than the device the model is on · Issue #97693 · pytorch/pytorch · GitHub