RuntimeError: Expected hidden size (2, 1, 128), got [64, 1, 128]

I tried to build a simple RNN for my credict card fraud dataset but I get an error.

class SimpleRNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(SimpleRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers

        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)

        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # initialize hidden state with zeros
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        h0 = h0[0]
        
        # Debugging: Print the shapes of batch_X and h0
        print(f"batch_X shape: {x.shape}")
        print(f"h0 shape: {h0.shape}")

        out, _ = self.rnn(x, h0)

        out = out[:, -1, :]
        out = self.fc(out)
        
        return out
sss = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=2)

for train_index, test_index in sss.split(x, y):
    X_train, X_test = x.iloc[train_index], x.iloc[test_index]
    y_train, y_test = y.iloc[train_index], y.iloc[test_index]

X_train_tensor = torch.tensor(X_train.values, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32)

X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32)

train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)

batch_size = 64
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=batch_size)
input_size = X_train_tensor.shape[1]
hidden_size = 128
num_layers = 2
num_classes = 1
lr = 0.001

X_train_tensor = X_train_tensor.view(-1, input_size)
model = SimpleRNN(input_size, hidden_size, num_layers, num_classes)
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

The error:

batch_X shape: torch.Size([64, 30])
h0 shape: torch.Size([64, 128])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[91], line 9
      7 for batch_X, batch_y in train_loader:
      8     optimizer.zero_grad()
----> 9     outputs = model(batch_X)
     10     loss = criterion(outputs.squeeze(), batch_y)
     11     loss.backward()

File ~\OneDrive\Masaüstü\BUTTERFLY\cv\Lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

Cell In[83], line 20, in SimpleRNN.forward(self, x)
     17 print(f"batch_X shape: {x.shape}")
     18 print(f"h0 shape: {h0.shape}")
---> 20 out, _ = self.rnn(x, h0)
     22 out = out[:, -1, :]
     23 out = self.fc(out)

File ~\OneDrive\Masaüstü\BUTTERFLY\cv\Lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~\OneDrive\Masaüstü\BUTTERFLY\cv\Lib\site-packages\torch\nn\modules\rnn.py:505, in RNN.forward(self, input, hx)
    502     hx = self.permute_hidden(hx, sorted_indices)
    504 assert hx is not None
--> 505 self.check_forward_args(input, hx, batch_sizes)
    506 assert self.mode == 'RNN_TANH' or self.mode == 'RNN_RELU'
    507 if batch_sizes is None:

File ~\OneDrive\Masaüstü\BUTTERFLY\cv\Lib\site-packages\torch\nn\modules\rnn.py:256, in RNNBase.check_forward_args(self, input, hidden, batch_sizes)
    253 self.check_input(input, batch_sizes)
    254 expected_hidden_size = self.get_expected_hidden_size(input, batch_sizes)
--> 256 self.check_hidden_size(hidden, expected_hidden_size)

File ~\OneDrive\Masaüstü\BUTTERFLY\cv\Lib\site-packages\torch\nn\modules\rnn.py:239, in RNNBase.check_hidden_size(self, hx, expected_hidden_size, msg)
    236 def check_hidden_size(self, hx: Tensor, expected_hidden_size: Tuple[int, int, int],
    237                       msg: str = 'Expected hidden size {}, got {}') -> None:
    238     if hx.size() != expected_hidden_size:
--> 239         raise RuntimeError(msg.format(expected_hidden_size, list(hx.size())))

RuntimeError: Expected hidden size (2, 1, 128), got [64, 1, 128]```

No idea why you are doing this…but don’t :).

Before that, h0 has the correct shape: (num_layer, batch_size, hidden_size). After that line, it’s only (batch_size, hidden_size). The nn.RNN, since it sees only a 2d tensor, interprets this as (seq_len, hidden_size).

The problem is that nn.RNN wants a 3d tensor. It therefore does an unsequeeze(1) to add the batch dimension – again, the nn.RNN thinks it’s an unbatched input, so it makes it a batched one. You can also check this post that addresses the same issue.

1 Like

I asked ChatGPT to fix it before posting,silly gpt. I fixed the problem by removing that line,forgot to update the post. thank you