F.interpolate and Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

When running model on server I get the following error:

I use

F.interpolate(g, size=input_size[2:], mode=upsample_mode) in the forward method
and get following error when running script on the server:

File “/usr/local/lib/python3.10/dist-packages/torch/_decomp/decompositions.py”, line 2821, in upsample_bilinear2d x_ceil = torch.ceil(x).clamp(max=in_h - 1).to(torch.int64)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument max in method wrapper_CUDA_clamp_Tensor)

When running the same model locally on GPU this error does not appear.

Also this error seems to not appear when running on the server, but directly specifying size, e.g: F.interpolate(g, size=(64,64), mode=upsample_mode), but I need size to be calculated and not specified.

In the following thread (`F.interpolate` uses incorrect size when `align_corners=True` · Issue #76487 · pytorch/pytorch · GitHub) I have seen suggestion: `F.interpolate` uses incorrect size when `align_corners=True` · Issue #76487 · pytorch/pytorch · GitHub

Do you know any reason for this problem or any workaround for this? I do not think that this related to moving model or tensor to cuda.

I don’t know where the device mismatch comes from so could you print all inputs to F.interpolate?

It seems it related to the following incident: [pt2] The min and max parameters of torch.clamp do not support numpy format · Issue #94174 · pytorch/pytorch · GitHub, somehow related how torch.clamp works.

I am not completely able to follow explanation on what is happening with dynamo and inductor, but when I downgraded PyTorch to 2.0.1 on the server (to be the same as on the local machine, where it worked without this error) - then it worked on server as well.

So basically running the same code in this image at server: PyTorch Release 23.05 - NVIDIA Docs or with PyTorch 2.0.1 locally, F.interpolate functions does not throw the device mismatch error.
When running in the following image: rel-23-08 it throws the above mentioned device mismatch error unless size explicitly fixed with numbers e.g. size=(64,64).
So I assume that there was some change between PyTorch 2.0.1 and PyTorch 2.1, which causes this error.

As to the inputs to F.interpolate:

  1. I have tried input_size[2:] - which is torch.Size - works on PyTorch 2.0.1, but not PyTorch 2.1
  2. I have tried [int(h), int(w)], (int(h), int(w)) - works on PyTorch 2.0.1, but not PyTorch 2.1
  3. and (64,64) - the only one which worked on PyTorch 2.1 and PyTorch 2.0.1