Is `map_location` in `torch.load()` and `model.load_state_dict()` independent from `device` in `.to()`?

Hello community,

When I get a model on CPU then do model.load_state_dict(torch.load(PATH, map_location=device)) as explained here, model.device doesn’t return the device specified in model.load_state_dict(torch.load(PATH, map_location=device)) but “cpu” instead. I then have to perform model.to(device) for it to be on the desired device.

Yet when I load a random Tensor stored on a HDD with my_tensor = torch.load(PATH, map_location=device) then model.device does returns the device specified in torch.load(PATH, map_location=device).

Why is that? Is it load_state_dict that behaves in a special way? Or do I also need to do my_tensor = my_tensor.to(device) after my_tensor = torch.load(PATH, map_location=device)?

And can I do my_tensor = torch.load(PATH, map_location="cpu") then my_tensor = my_tensor.to("cuda:0")? I don’t quite get if both are related or not, if they should be consistent and performed successively or not.

Thanks for the help.

Hi,

The map_location changes the device of the Tensors in the state dict that is returned.
But when you load_state_dict(), then these values are loaded (and only values) into the model. But that does not change the model’s device! you will need to move the model itself with .to() if you want to have it on a different device.

2 Likes

Ok, thanks for the answer. So the model is kind of separate from its weights.

Now talking about Tensors, do I need to do my_tensor = my_tensor.to(device) after my_tensor = torch.load(PATH, map_location=device)?
In other words, are my_tensor = torch.load(PATH, map_location="cpu") then my_tensor = my_tensor.to("cuda:0") VERSUS my_tensor = torch.load(PATH, map_location="cuda:0") exactly the same?

Yes the end result is the same. So no need to add a .to() if you already used map_location properly.

1 Like

Thanks for the cristal-clear answer! :slightly_smiling_face: