I trained my model on GPU and saved the model with
params = {k:v.cpu() for k, v in self.state_dict().items()}
torch.save(params, open('model.pt', 'wb'))
Then, when evaluating the model, I load it like this:
model = m(...)
params = torch.load('model.pt')
model.load_state_dict(params)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
However, I am getting much worse results on GPU, as on CPU. It seems like on CPU its performing correctly (I get similar results as in the paper), but on GPU its about ~20% worse.
I wrote all my code device agnostic, so apart from saving/loading a model, I never specify the device. In my forward pass, when I needed a new tensor, I usually used input_tensor.new_*(...) to create a new tensor on the same device as the input.
Anyone got an idea where the different performances might come from?
Hi, I have a similar issue which is mystifying. I started with this repo https://github.com/BelBES/crnn-pytorch and customised it for my project. But the model_loader is essentially the same as in the repo:
from .crnn import CRNN
def load_weights(target, source_state):
new_dict = OrderedDict()
for k, v in target.state_dict().items():
if k in source_state:
if v.size() == source_state[k].size():
new_dict[k] = source_state[k]
else:
print('Size MISMATCH: {} vs {}'.format(v.size(), source_state[k].size()))
new_dict[k] = v
else:
if 'num_batches_tracked' not in k:
print('Layer NOT FOUND: {}'.format(k))
new_dict[k] = v
target.load_state_dict(new_dict)
def load_model(abc, seq_proj=[0, 0], backend='resnet18', snapshot=None, cuda=True):
net = CRNN(abc=abc,
seq_proj=seq_proj,
backend=backend,
rnn_hidden_size=128,
rnn_num_layers=2,
rnn_dropout=0.5)
net = nn.DataParallel(net)
if snapshot is not None:
load_weights(net, torch.load(snapshot, map_location=lambda storage, loc: storage))
if cuda:
net = net.cuda()
return net
When I train and validate the model on GPU on a Ubuntu cloud instance, it gives very good performance, but when I copy the trained checkpoint to my MacBook and run a test script, it gives terrible performance, almost as if the weights were random. But once in a while for a particular checkpoint, it works fine.
I did some investigating but there doesn’t seem to be any problem with the load_weights() function. The only weights which are there in the checkpoint but not in the model definition are the ‘num_batches_tracked’ in the batch norm layers. Don’t think these are used during eval anyway.
Did you solve this? I have a similar problem, training the model on an Linux cloud instance gives nice results, but loading the checkpoint on my Macbook really tanks the performance!