Why the input size is taken as 100 for nn.LSTM(), instead I am supplying 2024

krishna511 · September 27, 2022, 10:33am

Hi there,
I am using the code snippet given below.
Here I am passing the parameter D_m=2024 but after executing the model I am getting the error:

RuntimeError: input.size(-1) must be equal to input_size. Expected 2024, got 100

at line

self.lstm = nn.LSTM(input_size=D_m, hidden_size=D_e, num_layers=2, bidirectional=True, dropout=dropout)

class LSTMModel(nn.Module):

    def __init__(self, D_m, D_e, D_h, n_classes=7, dropout=0.5):
        
        super(LSTMModel, self).__init__()
        print(D_m)
        self.n_classes = n_classes
        self.dropout   = nn.Dropout(dropout)
        self.lstm = nn.LSTM(input_size=D_m, hidden_size=D_e, num_layers=2, bidirectional=True, dropout=dropout) 
        self.matchatt = MatchingAttention(2*D_e, 2*D_e, att_type='general2')
        self.linear = nn.Linear(2*D_e, D_h)
        self.smax_fc = nn.Linear(D_h, n_classes)

    def forward(self, U, qmask, umask, att2=True):
        """
        U -> seq_len, batch, D_m
        qmask -> seq_len, batch, party
        """ 
        
        emotions, hidden = self.lstm(U)
        alpha, alpha_f, alpha_b = [], [], []
        
        if att2:
            att_emotions = []
            alpha = []
            for t in emotions:
                att_em, alpha_ = self.matchatt(emotions,t,mask=umask)
                att_emotions.append(att_em.unsqueeze(0))
                alpha.append(alpha_[:,0,:])
            att_emotions = torch.cat(att_emotions,dim=0)
            hidden = F.relu(self.linear(att_emotions))
        else:
            hidden = F.relu(self.linear(emotions))
        
        hidden = self.dropout(hidden)
        log_prob = F.log_softmax(self.smax_fc(hidden), 2)
        return log_prob, alpha, alpha_f, alpha_b, emotions

Another thing: I am not getting this error while executing the code through terminal, but using IDE.
any help.

ptrblck · September 27, 2022, 3:39pm

Your code is not executable, but I guess U has the wrong shape:

D_m = 2024
D_e = 1
dropout=0.5
lstm = nn.LSTM(input_size=D_m, hidden_size=D_e, num_layers=2, bidirectional=True, dropout=dropout) 

x = torch.randn(10, 10, D_m)
out = lstm(x)
print(out[0].shape)
# torch.Size([10, 10, 2])

# error
x = torch.randn(10, 10, 100)
out = lstm(x)
# RuntimeError: input.size(-1) must be equal to input_size. Expected 2024, got 100

krishna511 · September 27, 2022, 5:12pm

@ptrblck sir how to make it executable, I am confused what is U here as during model training I am only passing these five arguments.

model = LSTMModel(D_m, D_e, D_h, 
                              n_classes=n_classes, 
                              dropout=args.dropout)

Also sir I am not facing this problem while training the model from terminal using .sh file. Why so?
Thanks again

ptrblck · September 27, 2022, 5:14pm

U is the input to your forward method:

def forward(self, U, qmask, umask, att2=True):

krishna511 · September 27, 2022, 5:42pm

Yes sir that I can see, but is not the model inputs are instantiated in name only? This is something new for me I need to study.
Meanwhile sir I print the shape of U is , it is torch.Size([94, 32, 100])
From where it is taking this shape that I know as of now.
But it is intuitive now why is it taking 100.

Can I do something here manually to make it run, I tried
def forward(self, U=2024, qmask, umask, att2=True):
but this is not correct syntactically.

ptrblck · September 27, 2022, 6:09pm

No, assigning a default int value to U won’t solve the issue, as it’s caused by a shape mismatch in your tensor.
The LSTM layer expects U to have 2024 features while U has only 100 features in your case, so you might want to change the LSTM setup.

krishna511 · September 29, 2022, 8:39am

Sir could you please elaborate for me, I have included the train function here.

def train_or_eval_model(model, loss_function, dataloader, epoch, optimizer=None, train=False):

    losses, preds, labels, masks = [], [], [], []
    alphas, alphas_f, alphas_b, vids = [], [], [], []
    max_sequence_len = []

    assert not train or optimizer!=None
    if train:
        model.train()
    else:
        model.eval()

    seed_everything()
    for data in dataloader:
        if train:
            optimizer.zero_grad()
        
        textf, visuf, acouf, qmask, umask, label = [d.cuda() for d in data[:-1]] if cuda else data[:-1]        

        max_sequence_len.append(textf.size(0))
        
        log_prob, alpha, alpha_f, alpha_b, _ = model(textf, qmask, umask)
        lp_ = log_prob.transpose(0,1).contiguous().view(-1, log_prob.size()[2])
        labels_ = label.view(-1)
        loss = loss_function(lp_, labels_, umask)

        pred_ = torch.argmax(lp_,1)
        preds.append(pred_.data.cpu().numpy())
        labels.append(labels_.data.cpu().numpy())
        masks.append(umask.view(-1).cpu().numpy())

        losses.append(loss.item()*masks[-1].sum())
        if train:
            loss.backward()
            if args.tensorboard:
                for param in model.named_parameters():
                    writer.add_histogram(param[0], param[1].grad, epoch)
            optimizer.step()
        else:
            alphas += alpha
            alphas_f += alpha_f
            alphas_b += alpha_b
            vids += data[-1]

    if preds!=[]:
        preds  = np.concatenate(preds)
        labels = np.concatenate(labels)
        masks  = np.concatenate(masks)
    else:
        return float('nan'), float('nan'), [], [], [], float('nan'),[]

    avg_loss = round(np.sum(losses)/np.sum(masks), 4)
    avg_accuracy = round(accuracy_score(labels,preds, sample_weight=masks)*100, 2)
    avg_fscore = round(f1_score(labels,preds, sample_weight=masks, average='weighted')*100, 2)
    
    return avg_loss, avg_accuracy, labels, preds, masks, avg_fscore, [alphas, alphas_f, alphas_b, vids]

I guess now you can throw some light over this U

ptrblck · September 29, 2022, 8:56am

The textf tensor is used as U inside the forward function of your model in:

log_prob, alpha, alpha_f, alpha_b, _ = model(textf, qmask, umask)

and raises the shape mismatch error.

krishna511 · September 29, 2022, 9:12am

Yes sir, textf is the input to the forward method. Sir, I am not getting how the list data[:-1] gives these six arguments. Out of which first is textf
right sir?

So how to see its shape?
as shape method won’t work for list object sir.

How to see the data iterable here in line?
textf, visuf, acouf, qmask, umask, label = [d.cuda() for d in data[:-1]] if cuda else data[:-1]

only I can see what there inside textf

regards

Edited:
Sir data[:-1] is giving something like this, can I see its shape for textf

[tensor([[[  3.8069,   4.0393, -18.5451,  ...,   0.6770,   9.6547,   0.7462],
         [ -0.3338, -10.9261,  18.1508,  ...,   6.1842, -13.7558,   4.2472],
         [ 26.7121,   7.8379,  23.0526,  ...,   8.0370,  -4.4172,  -6.2294],
         ...,
         [  9.1629,   1.9096,  11.8995,  ...,   7.5719,  -1.2191,   3.1967],
         [-12.1891,  -8.2596,  -9.6170,  ...,   9.7945,   3.6077,   9.0825],
         [-13.6697,  -4.5417,   7.4633,  ...,   7.3096,  -6.6058,  -4.7105]],


        [[ -3.9516,   8.5528,  -5.4474,  ...,   4.8330,  13.0568,   3.5334],
         [ -1.3001,   6.7080,   5.8198,  ...,  12.7296, -12.3578, -16.8290],
         [ -2.6360,   8.6186,   0.4096,  ...,   5.4058,   3.7162,  -3.0806],
         ...,
         [ 17.3522,  -3.8530,  27.7394,  ...,  19.1740,  -9.4405,   3.2070],
         [ -1.4376,  10.2238,   8.5770,  ...,   2.2476,  -4.2998, -11.3337],
         [  0.6157,  12.2440,   4.0483,  ...,   6.8115,   4.0356,  -8.5347]],

        [[ -0.2789,   4.1072,   1.9789,  ...,   4.9250,   7.1271,  20.9585],
         [  9.0213,  -5.7624,   9.2459,  ...,  14.9914, -10.4582,   1.0315],
         [ 11.8741,  -5.5018,  33.8390,  ...,  15.5379,  -0.1782,   0.6436],
         ...,
         [  1.4427,   6.8841,   7.7137,  ...,  -3.6132, -10.7351,   4.7339],
         [  7.8908,  -6.7292,  13.9702,  ...,  14.6724, -10.4947,  -9.4663],
         [ -8.6848,   1.2788,  -0.3733,  ...,   6.6203,  -5.9540,  -5.6786]],

        ...,

        [[  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         ...,
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000]],

        [[  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         ...,
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000]],

        [[  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         ...,
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000],
         [  0.0000,   0.0000,   0.0000,  ...,   0.0000,   0.0000,   0.0000]]]), tensor([[[3.4706e-03, 2.7208e-03, 1.9824e-04,  ..., 6.0700e-01,
          4.6247e-01, 6.5936e-01],
         [4.3708e-03, 1.0767e-03, 1.5189e-03,  ..., 3.1674e-01,
          7.5912e-01, 9.6048e-01],
         [2.8903e-03, 1.5746e-03, 1.6773e-03,  ..., 5.5900e-01,
          7.4475e-01, 1.1541e+00],
         ...,
         [3.1417e-03, 1.2172e-03, 1.8859e-03,  ..., 3.0021e-01,
          1.0797e+00, 8.7648e-01],
         [4.3041e-03, 2.9973e-03, 7.4364e-04,  ..., 3.1848e-01,
          1.9354e-01, 7.4678e-01],
         [6.1146e-03, 3.7793e-03, 3.7139e-04,  ..., 4.1630e-01,
          1.4303e-01, 3.7559e-01]],

        [[3.2174e-03, 1.3608e-03, 1.8058e-03,  ..., 2.5170e-01,
          5.1566e-01, 1.8082e+00],
         [3.6598e-03, 5.2537e-04, 1.5730e-03,  ..., 4.6812e-01,
          1.0705e+00, 3.7740e-01],
         [4.8084e-03, 2.5437e-03, 2.7978e-04,  ..., 5.1495e-01,
          4.3402e-01, 2.8046e-01],
         ...,
         [3.2426e-03, 1.4668e-03, 1.7441e-03,  ..., 4.0803e-01,
          9.3540e-01, 1.0251e+00],
         [4.7284e-03, 8.0752e-04, 1.6557e-03,  ..., 2.9357e-01,
          9.8255e-01, 1.1998e+00],
         [3.5986e-03, 1.7849e-03, 1.8468e-03,  ..., 3.7894e-01,
          4.6187e-01, 8.9074e-01]],

        [[4.2329e-03, 1.4879e-03, 1.9880e-03,  ..., 2.1639e-01,
          6.8746e-01, 1.3164e+00],
         [3.6778e-03, 1.2924e-03, 1.7514e-03,  ..., 5.7576e-01,
          7.8599e-01, 3.5961e-01],
         [2.9331e-03, 1.5189e-03, 1.8522e-03,  ..., 3.3200e-01,
          9.0730e-01, 8.6382e-01],
         ...,
         [3.4240e-03, 1.6784e-03, 1.8364e-03,  ..., 5.3633e-01,
          6.6132e-01, 1.1145e+00],
         [4.8436e-03, 3.7999e-03, 3.6642e-04,  ..., 3.5540e-01,
          1.6255e-01, 5.6613e-01],
         [5.5459e-03, 3.4224e-03, 2.5819e-04,  ..., 2.1355e-01,
          1.4003e-01, 5.8085e-01]],

        ...,

        [[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         ...,
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00]],

        [[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         ...,
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00]],

        [[0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         ...,
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00,  ..., 0.0000e+00,
          0.0000e+00, 0.0000e+00]]]), tensor([[[-0.2001, -0.6778, -0.4414,  ...,  0.7415, -0.8973, -0.3473],
         [-0.3060, -0.0296,  0.3175,  ...,  1.2147,  0.1490, -0.7178],
         [ 1.1041, -0.8605,  0.4408,  ..., -0.8370,  0.1948,  0.2298],
         ...,
         [ 0.4063, -0.5299,  0.1340,  ..., -0.8198, -0.8090,  0.1839],
         [-0.5947, -0.9170, -0.2485,  ..., -0.6266, -1.0154, -0.9670],
         [-0.6236, -0.2254, -0.8737,  ...,  1.3496, -0.9010, -0.7998]],

        [[-0.2867,  0.3619, -0.6954,  ..., -0.4963, -1.4556, -0.4162],
         [-0.7247,  0.6360,  0.5250,  ...,  1.3853, -0.6100, -0.2555],
         [-0.7391, -0.6169,  0.7464,  ..., -0.7071,  2.0086, -1.1998],
         ...,
         [-0.4455,  3.3896,  0.5172,  ...,  1.5917, -0.0611,  2.4300],
         [-0.3830,  0.0965, -0.4935,  ..., -0.8330,  0.5500, -0.6621],
         [-0.4648, -0.7952, -0.6616,  ...,  0.3995, -0.2436, -0.9408]],

        [[ 0.2282, -0.6734, -0.1276,  ..., -0.8413,  0.6873,  0.5052],
         [-0.6188, -0.8996,  0.7024,  ..., -0.8078, -1.0072, -0.4588],
         [ 0.2475, -0.7952,  0.5814,  ...,  1.1976,  0.2196,  0.6331],
         ...,
         [-0.5851,  0.8317,  0.5273,  ...,  1.4256, -0.2034,  0.1085],
         [ 2.3649,  1.2928, -0.2261,  ...,  1.5216, -0.6570,  1.6168],
         [-0.4600, -0.9301, -0.6822,  ...,  1.5168,  0.3797, -0.5014]],

        ...,

        [[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         ...,
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]],

        [[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         ...,
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]],

        [[ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         ...,
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000],
         [ 0.0000,  0.0000,  0.0000,  ...,  0.0000,  0.0000,  0.0000]]]), tensor([[[0., 1.],
         [1., 0.],
         [1., 0.],
         ...,
         [1., 0.],
         [1., 0.],
         [0., 1.]],

        [[1., 0.],
         [0., 1.],
         [0., 1.],
         ...,
         [1., 0.],
         [0., 1.],
         [1., 0.]],

        [[1., 0.],
         [1., 0.],
         [1., 0.],
         ...,
         [1., 0.],
         [1., 0.],
         [0., 1.]],

        ...,

        [[0., 0.],
         [0., 0.],
         [0., 0.],
         ...,
         [0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.],
         ...,
         [0., 0.],
         [0., 0.],
         [0., 0.]],

        [[0., 0.],
         [0., 0.],
         [0., 0.],
         ...,
         [0., 0.],
         [0., 0.],
         [0., 0.]]]), tensor([[1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.],
        ...,
        [1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.]]), tensor([[5, 2, 2,  ..., 0, 0, 0],
        [4, 4, 4,  ..., 0, 0, 0],
        [4, 2, 4,  ..., 0, 0, 0],
        ...,
        [4, 4, 4,  ..., 0, 0, 0],
        [2, 1, 4,  ..., 0, 0, 0],
        [4, 4, 4,  ..., 0, 0, 0]])]