The pytorch compile works great when I only have one GPU (gtx 1660 super, no tensor cores) installed on my computer. However, when I installed a new GPU card RTX 3060 (has tensor cores), which becomes the new “cuda:0”, even though I still chose gtx 1660 (“cuda:1” now) to run the model, the pytorch compile still choose rtx 3060, which has tensor cores as the targeted archecture. And I got the following warning:
compile_fx.py:90: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
and error:
compiler.py", line 1671, in _init_handles
mod, func, n_regs, n_spills = cuda_utils.load_binary(self.metadata["name"], self.asm["cubin"], self.shared, device)
RuntimeError: Triton Error [CUDA]: device kernel image is invalid
Is there a way I can specify the device to the torch.compile() function?