I cannot get torch.load
and map_location
to work as expected. I have tried three of the suggested methods for loading a model onto the GPU using map_location
(from references listed below). The model always ends up on the CPU, despite the documentation seeming to indicate that map_location
can load tensors directly onto the GPU. (It even says this should happen by default on a machine with GPUs) Here is a minimal working example, followed by further questions.
import torch
import torch.nn as nn
import torch.nn.functional as F
gpu = torch.device("cuda:0")
print(gpu)
cuda:0
# Build a simple model on the GPU
class Net(nn.Module):
def __init__(self, indim):
super(Net, self).__init__()
self.fc1 = nn.Linear(indim, 5)
self.fc2 = nn.Linear(5, 1)
def forward(self, t):
t = F.relu(self.fc1(t))
t = self.fc2(t)
return t
net = Net(3).to(gpu)
# Force weights to zero so we can later confirm
# we've loaded the right model
with torch.no_grad():
net.fc1.weight.fill_(0)
print(net.fc1.weight)
Parameter containing:
tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]], device=‘cuda:0’, requires_grad=True)
Notice my model is on cuda:0
# Save and delete the model
torch.save(net.state_dict(), 'my_gpu_mod.pth')
del(net)
Now we try three different ways to load and map_location
to the GPU.
# Load with map_location=device
model = Net(3)
print(model.fc1.weight.max())
model.load_state_dict(torch.load("my_gpu_mod.pth",
map_location=gpu))
print(model.fc1.weight.device)
print(model.fc1.weight.max())
tensor(0.5071, grad_fn=)
cpu
tensor(0., grad_fn=)
# Load with map_location=string
model = Net(3)
print(model.fc1.weight.max())
model.load_state_dict(torch.load("my_gpu_mod.pth",
map_location="cuda:0"))
print(model.fc1.weight.device)
print(model.fc1.weight.max())
tensor(0.4118, grad_fn=)
cpu
tensor(0., grad_fn=)
# Load with map_location=lambda
model = Net(3)
print(model.fc1.weight.max())
model.load_state_dict(torch.load('my_gpu_mod.pth',
map_location=lambda storage, loc: storage.cuda(0)))
print(model.fc1.weight.device)
print(model.fc1.weight.max())
tensor(0.5070, grad_fn=)
cpu
tensor(0., grad_fn=)
If map_location
does not automatically put things on the GPU, why does it need to be used at all when loading a GPU-trained model on the same machine that trained it? I can simply call .to
afterwards.
References