RuntimeError: Expected hidden size (2, 9, 100), got (2, 20, 100)

I am using pytorch for CNN and now I need to use RNN, however, after few days labor, still could not run it successfully. The network looks like below

class Net(nn.Module):
    def __init__(self, input_dim, nb_lstm_units, layer_dim, output_dim, batch_size=20):
        super(Net, self).__init__()
        self.nb_lstm_units = nb_lstm_units
        self.layer_dim = layer_dim
        self.batch_size = batch_size

        self.rnn = nn.RNN(input_dim, nb_lstm_units, layer_dim, batch_first=True, nonlinearity='relu')
        self.fc_out = nn.Linear(nb_lstm_units, output_dim)        
    def init_hidden(self):
        hidden_a = torch.randn(self.layer_dim, self.batch_size, self.nb_lstm_units)        
        hidden_a = hidden_a.cuda()            
        return Variable(hidden_a)
    def forward(self, X): 
        batch_size, seq_len, _ = X.size()

        # dimension (batch, n_features, timestamp) -> (batch, timestamp, n_features)
        X = X.transpose(1, 2)

        self.hidden = self.init_hidden()
        out, self.hidden = self.rnn(X, self.hidden)

        out = self.fc_out( out[:, -1, :] )
        return out

# Create RNN
input_dim = 1
nb_lstm_units = 100
layer_dim = 2
output_dim = 2
model = Net(input_dim, nb_lstm_units, layer_dim, output_dim)

Input is 1D ECG signal segmented by 217 samples, batch size 20. The forward is passed and I can print the output of self.fc_out but the error is shown afterward. Hidden output of RNN need extra processing?

Why do you init_hidden in forward? If you want them to be parameter of your model, you should init them in the constructor.

As each data segment is passed to forward one at a time, I want RNN not to link the current data segment with previous segments. Plz correct if I am wrong. Several RNN tutorials (like this), I found doing in similar way.

In the tutorial, you linked there’re two differences:

  • They initialize with zeros, you initialize with random values
  • They don’t store it as the module’s parameters but you do

Changing the initialisation and rnn call, as suggested, produces same error.

h0 = self.init_hidden()
out, h1 = self.rnn(X, h0)

Is this problem is kind of an open issue or I am missing something? I had to move to Keras to do same and it works fine. But pytorch gives much freedom to play with my code.

Any help is appreciated.

In forward(), using null value made the error message disappear.

self.rnn(X, None)

Now, the training loss decreases nicely but suddenly becomes a huge positive value. Following other examples, I can see that this is not the ideal way. Any suggestion to handle hidden value and initial weight is much appreciated.