Empty state_dict after moving model to GPU

Hi all,

I am encountering a weird phenomenon when trying to save the parameters of a custom model that has being moved to the GPU in Pytorch 0.4.0. When I call model.state_dict(), the resulting OrderedDict is empty, see the following MWE

import torch
from torch.nn import Module, Parameter

class test_module(Module):

    def __init__(self, device=None):

        super(test_module, self).__init__()
        if device is not None:
            self.device = device
            self.device = torch.device('cpu')
        self.W = Parameter(torch.ones((3, 5))).to(self.device)

cpu = test_module()

gpu = test_module(torch.device('cuda'))

The first printout gives me the desired dictionary, but the second one is empty. Furthermore, if I call print(cpu.to(torch.device('cuda')).state_dict()) I do get the correct dictionary, while if I call print(gpu.to(torch.device(‘cpu’)).state_dict()) then I get an empty object.

Is this expected behavior, or am I doing something wrong?

Move the to(self.device) operation into the Parameter call:

self.W = Parameter(torch.ones((3, 5)).to(self.device))

The .to is a no-op for CPU tensors in your code, while you are manipulating and thus losing the nn.Parameter for the GPU case.

1 Like

That worked seamlessly. Thank you very much for the explanation!