I’m facing a cryptic error which I can’t find any discussion about.
I’m simply trying to pass the output of a LSTM layer to a linear layer. The code is as follows:
output, hidden = hidden
logits = self.fully_connected(output)
fully_conected is simply nn.Linear. and output is a torch.cuda.FloatTensor of size (batch_size, n_hidden).
The error message I’m receiving is the following:
TypeError: CudaThreshold_updateOutput received an invalid combination of arguments - got (int, torch.cuda.FloatTensor, torch.cuda.FloatTensor, int, int, Linear), but expected (int state, torch.cuda.FloatTensor input, torch.cuda.FloatTensor output, float threshold, float val, bool inplace)
Does anyone know what might be the cause of this error?
somewhere in your code, instead of passing the output of the linear layer to the next layer, you are passing the Linear layer itself.
Is there a small snippet that reproduces this?
I’m not sure.
What I’m trying to do is copy the parameters of a language model and use it to pretrain a text classifier. I’m trying to copy the word embedding and a LSTM unit. WHat I’m doing is the following:
- I load both modules with pytorch:
checkpoint = torch.load(load_model)
embed = checkpoint[‘embed’]
rnn = checkpoint[‘rnn’]
- From the rnn I try to get only the rnn cell:
Out: LSTMCell(256, 1024)
- I try to pass these modules to a new RNN:
self.embedding = copy.deepcopy(pretrained_embed)
self.rnn_cell = copy.deepcopy(pretrained_lstm_cell)
self.rnn_size = pretrained_lstm_cell.hidden_size
- Then I try to connect the copied component to a new Linear layer from which I expect to get the logits of my text classifier:
self.fully_connected = nn.Linear(self.rnn_size, num_classes)
Maybe I’m copying the LSTMCell in the wrong way and it is still attached to the linear output of my language model? If that is the case, what would be the correct way to reuse my pretrained parameters in this new graph?
In case anyone faces the same problem, what happened was that I appended the F.relu operation directly in the graph to the nn.Linear. I believe this caused a problem when I tried to use the layer in the forward method.