I recently ran into this discussion referencing the difference in the assignment of my_model.to(device)
and my_tensor.to(device)
. While currently the model will be moved in-place to the device, tensors must be assigned to the same variable to be moved to the device (my_tensor = my_tensor.to(device)
).
I have searched this forum for similar topics explaining this logic. There a lot of discussions on the specific usage of .to(device)
, but none that explicitly addresses why PyTorch implements these methods differently for models vs. tensors
From my point-of-view, the issue with this logic is when using my_model.to(device)
along with my_tensor.to(device)
a device mismatch will occur, without the problem being directly visible. As a developer, you would need to know which objects are tensors or not in the code (especially hard for PyTorch beginners). Also, the logic is inconsistent with other in-place methods, for example .add_()
, that contain an underscore _
specifically for this reason.
Is there any particular logic or design philosophy behind this decision to have different behaviors for the same method name? For me, it seems counterintuive that .to(device)
behaves differently on object types. Would it make sense to make the behaviour of .to(device)
consistent for models and tensors, for example making both my_model.to(device)
and my_tensor.to(device)
performed in-place?