Error when loading state_dict, even if the dimensions match

Frederic_Laurin · March 27, 2018, 7:32am

I was training a Neural Turing Machine with memory bias and go the following error when loading the model. It has to do with the “mem_bias”, but even the error message says that the sizes torch.Size([128, 20]) seem to match.

Please note the variable was saved as a buffer, but other variables were too and the load seems to have worked for them.
“self.register_buffer(‘mem_bias’, Variable(torch.Tensor(N, M)))”

It was all run on CPU.

Error message:
Traceback (most recent call last):
File “C:\Users\frede\Anaconda3\lib\site-packages\torch\nn\modules\module.py”, line 514, in load_state_dict
own_state[name].copy_(param)
File “C:\Users\frede\Anaconda3\lib\site-packages\torch\autograd\variable.py”, line 67, in getattr
return object.getattribute(self, name)
AttributeError: ‘Variable’ object has no attribute ‘copy_’

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File “C:/Users/frede/Documents/GitHub/Homework3/Evaluate_NTM_LSTM.py”, line 56, in
loadNTM(model_NTM_LSTM, files_NTM_LSTM)
File “C:\Users\frede\Documents\GitHub\Homework3\train.py”, line 89, in loadNTM
model.net.load_state_dict(state_dict)
File “C:\Users\frede\Anaconda3\lib\site-packages\torch\nn\modules\module.py”, line 519, in load_state_dict
.format(name, own_state[name].size(), param.size()))
RuntimeError: While copying the parameter named ntm.memory.mem_bias, whose dimensions in the model are torch.Size([128, 20]) and whose dimensions in the checkpoint are torch.Size([128, 20]).

Any idea about what it can be?

ptrblck · March 27, 2018, 9:00am

I suppose your mem_bias is trainable?
If so, you should use register_parameter.
Using register_buffer makes sure the buffer is part of the model’s state_dict, but is not a Parameter.
For example the running_mean and running_var in BatchNorm layers are registered as buffers.

Your error comes probably from using Variables. Registered buffers should be Tensors (source).

Frederic_Laurin · March 27, 2018, 11:44am

Thanks for your quick reply.

mem_bias in this case is just an initialization variable, so it is not meant to be a trainable Parameter.

I tried not to register any buffer, but I get the following error message when I try to load state_dict.

C:/Users/frede/Documents/GitHub/Homework3/Evaluate_NTM_LSTM.py
Traceback (most recent call last):
C:\Users\frede\Documents\GitHub\Homework3\copy\copy-task_NTM-LSTM-1000-batch-1000.model
File “C:/Users/frede/Documents/GitHub/Homework3/Evaluate_NTM_LSTM.py”, line 56, in
loadNTM(model_NTM_LSTM, files_NTM_LSTM)
File “C:\Users\frede\Documents\GitHub\Homework3\train.py”, line 89, in loadNTM
model.net.load_state_dict(state_dict)
File “C:\Users\frede\Anaconda3\lib\site-packages\torch\nn\modules\module.py”, line 526, in load_state_dict
raise KeyError(‘missing keys in state_dict: “{}”’.format(missing))
KeyError: ‘missing keys in state_dict: “{‘memory.mem_bias’, ‘ntm.memory.mem_bias’, ‘ntm.heads.1.memory.mem_bias’, ‘ntm.heads.0.memory.mem_bias’}”’

Any other ideas about how to be able to save and load it in this setting?

ptrblck · March 27, 2018, 1:44pm

You can register the buffer, but as a Tensor not as a Variable.
Could you try that again? Let me know if it works.

Frederic_Laurin · March 28, 2018, 2:06am

I just tried to register the buffer as a torch.Tensor:
init_mem_bias = torch.Tensor(self.N, self.M)
self.register_buffer(‘mem_bias’, init_mem_bias)

This triggered invalid types later in the program where a Variable is expected.

File “C:\Users\frede\Documents\GitHub\Homework3\ntm\memory.py”, line 177, in _similarity
w = F.softmax(β * F.cosine_similarity(self.memory + 1e-16, k + 1e-16, dim=-1), dim=1)
File “C:\Users\frede\Anaconda3\lib\site-packages\torch\nn\functional.py”, line 1639, in cosine_similarity
w12 = torch.sum(x1 * x2, dim)
File “C:\Users\frede\Anaconda3\lib\site-packages\torch\tensor.py”, line 321, in mul
return self.mul(other)
TypeError: mul received an invalid combination of arguments - got (Variable), but expected one of:

(float value)
didn’t match because some of the arguments have invalid types: (!Variable!)
(torch.FloatTensor other)
didn’t match because some of the arguments have invalid types: (!Variable!)

At this stage, since I have a report to submit, I will just have to do without saving the model. It might be the complicated structure of the Neural Turing Machine, because I normally don’t have any issues with save and load state_dict.

Thanks a lot for your input anyway.

ptrblck · March 28, 2018, 8:17am

This problem seems to come from self.memory or k.
Could you check if one of these was registered as a Variable?

mosassem88 · April 3, 2018, 5:21am

Hi,
did you mange to solve this issue, i am facing the sane exact error while loading ntm model also

Frederic_Laurin · April 3, 2018, 11:28am

I managed to run and submit my report without loading, so I didn’t solve the issue.

It’s probably a Variable to Parameter issue, but I didn’t have time to sort this out.