When I get a model on CPU then do model.load_state_dict(torch.load(PATH, map_location=device)) as explained here, model.devicedoesn’t return the device specified in model.load_state_dict(torch.load(PATH, map_location=device)) but “cpu” instead. I then have to perform model.to(device) for it to be on the desired device.
Yet when I load a random Tensor stored on a HDD with my_tensor = torch.load(PATH, map_location=device) then model.devicedoes returns the device specified in torch.load(PATH, map_location=device).
Why is that? Is it load_state_dict that behaves in a special way? Or do I also need to domy_tensor = my_tensor.to(device)aftermy_tensor = torch.load(PATH, map_location=device)?
And can I do my_tensor = torch.load(PATH, map_location="cpu") then my_tensor = my_tensor.to("cuda:0")? I don’t quite get if both are related or not, if they should be consistent and performed successively or not.
The map_location changes the device of the Tensors in the state dict that is returned.
But when you load_state_dict(), then these values are loaded (and only values) into the model. But that does not change the model’s device! you will need to move the model itself with .to() if you want to have it on a different device.
Ok, thanks for the answer. So the model is kind of separate from its weights.
Now talking about Tensors, do I need to do my_tensor = my_tensor.to(device)aftermy_tensor = torch.load(PATH, map_location=device)?
In other words, are my_tensor = torch.load(PATH, map_location="cpu") then my_tensor = my_tensor.to("cuda:0") VERSUS my_tensor = torch.load(PATH, map_location="cuda:0") exactly the same?