Seq2seq: For unbatched 2-D input, hx and cx should also be 2-D but got (3-D, 3-D) tensors

I debug program each step that the first is encoder layer I don’t got any error as below

input_dim = len(en_vocab) 
output_dim = 11
encoder_embedding_dim = 512
decoder_embedding_dim = 512
hidden_dim = 1024
n_layers = 2
encoder_dropout = 0.5
decoder_dropout = 0.5
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

encoder = Encoder(
    input_dim,
    encoder_embedding_dim,
    hidden_dim,
    n_layers,
    encoder_dropout,
)
hidden, cell = encoder(batch["numericalized_input"])
hidden.shape, cell.shape

Output:

src shape: torch.Size([26, 3])
embedded shape: torch.Size([26, 3, 512])
hidden shape: torch.Size([2, 26, 1024])
(torch.Size([2, 26, 1024]), torch.Size([2, 26, 1024]))

My decoder layer

class Decoder(nn.Module):
    def __init__(self, output_dim,hidden_dim, n_layers, dropout):
        super().__init__()
        self.output_dim = output_dim
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        self.rnn = nn.LSTM(output_dim, hidden_dim, n_layers, dropout=dropout,batch_first=True)
        self.fc_out = nn.Linear(hidden_dim, output_dim)
        self.dropout = nn.Dropout(dropout)

    def forward(self, input, hidden,cell):
        input = input.unsqueeze(0) 
        input = input.float()
        print('input :',input)
        output,(hidden,cell) = self.rnn(input,(hidden,cell))
        prediction = self.fc_out(output.squeeze(0))
        return prediction,hidden, output

When I run this program

decoder = Decoder(
    output_dim,
    hidden_dim,
    n_layers,
    decoder_dropout,
)
prediction, hidden, output = decoder(batch["target"][0], hidden,cell)
prediction.shape, hidden.shape

I’m getting the following error

RuntimeError: For unbatched 2-D input, hx and cx should also be 2-D but got (3-D, 3-D) tensors

Anyone with the solution. Kindly reply. Thanks in advance.

What are you trying to do in the first place, i.e., what kind of task are you trying to learn.

I’m just asking since it seems odd that batch["target"][0] is the input parameter of the forward() method of the decoder.

I try to do seq2seq model with text as input and numerical data that is output. I prepare data text to word index to pass through encoder and numerical data for decoder

batch_size = 3
train_data_loader = get_data_loader(train_data, batch_size, pad_index, shuffle=True)

I just run to batch["numericalized_input"], batch["target"] to debug the program

for i, batch in enumerate(train_data_loader):
    print('i', i)
    src = batch["numericalized_input"]
    trg = batch["target"]
    print('src:',src)
    print('trg:',trg)

Output

i 0
src: tensor([[ 2, 2, 2],
[ 4, 4, 4],
[ 0, 0, 0],
[ 5, 5, 5],
[ 4, 4, 4],
[ 0, 0, 0],
[ 0, 0, 6],
[18, 3, 0],
[ 4, 1, 0],
[23, 1, 4],
[ 6, 1, 11],
[ 5, 1, 0],
[ 0, 1, 7],
[ 9, 1, 0],
[ 0, 1, 7],
[ 0, 1, 6],
[21, 1, 13],
[ 7, 1, 4],
[20, 1, 16],
[ 8, 1, 8],
[ 4, 1, 17],
[11, 1, 8],
[ 3, 1, 4],
[ 1, 1, 15],
[ 1, 1, 18],
[ 1, 1, 4],
[ 1, 1, 14],
[ 1, 1, 3]])
trg: tensor([[ 2, 2, 2],
[ 9, 15, 7],
[ 6, 5, 8],
[ 7, 7, 9],
[22, 6, 22],
[ 5, 9, 5],
[15, 8, 15],
[15, 9, 15],
[22, 8, 22],
[33, 9, 33],
[ 3, 3, 3]])
i 1
src: tensor([[ 2, 2, 2],
[ 9, 19, 6],
[ 0, 5, 5],
[ 4, 12, 0],
[11, 0, 9],
[24, 3, 4],
[ 0, 1, 0],
[ 0, 1, 7],
[22, 1, 6],
[ 0, 1, 5],
[10, 1, 0],
[ 0, 1, 0],
[ 4, 1, 0],
[16, 1, 4],
[ 8, 1, 15],
[17, 1, 21],
[ 0, 1, 7],
[ 6, 1, 20],
[ 0, 1, 0],
[ 0, 1, 24],
[10, 1, 0],
[ 0, 1, 7],
[ 4, 1, 0],
[23, 1, 12],
[ 3, 1, 0],
[ 1, 1, 3]])
trg: tensor([[ 2, 2, 2],
[ 8, 22, 9],
[ 9, 5, 5],
[ 4, 8, 33],
[22, 22, 7],
[ 5, 5, 8],
[15, 15, 9],
[15, 15, 7],
[22, 22, 8],
[33, 33, 9],
[ 3, 3, 3]])