Different results on cpu and gpu

I trained my model on GPU and saved the model with

params = {k:v.cpu() for k, v in self.state_dict().items()}
torch.save(params, open('model.pt', 'wb'))

Then, when evaluating the model, I load it like this:

model = m(...)
params = torch.load('model.pt')

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

However, I am getting much worse results on GPU, as on CPU. It seems like on CPU its performing correctly (I get similar results as in the paper), but on GPU its about ~20% worse.

I wrote all my code device agnostic, so apart from saving/loading a model, I never specify the device. In my forward pass, when I needed a new tensor, I usually used input_tensor.new_*(...) to create a new tensor on the same device as the input.

Anyone got an idea where the different performances might come from?

can you share your model please?

Hi, I have a similar issue which is mystifying. I started with this repo https://github.com/BelBES/crnn-pytorch and customised it for my project. But the model_loader is essentially the same as in the repo:

from .crnn import CRNN

def load_weights(target, source_state):
    new_dict = OrderedDict()
    for k, v in target.state_dict().items():
        if k in source_state:
            if v.size() == source_state[k].size():
                new_dict[k] = source_state[k]
                print('Size MISMATCH: {} vs {}'.format(v.size(), source_state[k].size()))
                new_dict[k] = v
            if 'num_batches_tracked' not in k:
                print('Layer NOT FOUND: {}'.format(k))
            new_dict[k] = v

def load_model(abc, seq_proj=[0, 0], backend='resnet18', snapshot=None, cuda=True):
    net = CRNN(abc=abc,
    net = nn.DataParallel(net)
    if snapshot is not None:
        load_weights(net, torch.load(snapshot, map_location=lambda storage, loc: storage))
    if cuda:
        net = net.cuda()
    return net

When I train and validate the model on GPU on a Ubuntu cloud instance, it gives very good performance, but when I copy the trained checkpoint to my MacBook and run a test script, it gives terrible performance, almost as if the weights were random. But once in a while for a particular checkpoint, it works fine.

I did some investigating but there doesn’t seem to be any problem with the load_weights() function. The only weights which are there in the checkpoint but not in the model definition are the ‘num_batches_tracked’ in the batch norm layers. Don’t think these are used during eval anyway.

Any ideas? Thanks

Did you solve this? I have a similar problem, training the model on an Linux cloud instance gives nice results, but loading the checkpoint on my Macbook really tanks the performance!

I had the same problem. It was driving me crazy

Hey guys, been a long time, don’t remember know what the issue was, sorry.
Could’ve been something to do with the alphabet.

What does the alphabet mean

What does the alphabet mean