Why do I get "input.size(-1) must be equal to input_size" error in this case?

Hello! I have been trying to understand the dynamics for a while but I cannot make sense of the error below:

RuntimeError: input.size(-1) must be equal to input_size. Expected 1, got 2

My training data contains only one feature column, therefore I pass “n_features” = 1. Labels can only be 0 or 1. Since this is binary classification, I pass “n_classes” = 1.

class ModuleLSTM(nn.Module):
  def __init__(self, n_features, n_classes, n_hidden=256, n_layers=3):        
    super().__init__()
    self.lstm = nn.LSTM(
        input_size  = n_features,
        hidden_size = n_hidden,
        num_layers  = n_layers,
        batch_first = True,
        dropout     = 0.75 
    )                
    self.classifier = nn.Linear(n_hidden, n_classes)  
    
  def forward(self, x):
    self.lstm.flatten_parameters()
    _, (hidden, _) = self.lstm(x)
    return self.classifier(hidden[-1])

When I debug the ‘x’ in the forward method, I see that the shape is torch.Size([64, 5, 2]), where the 64 and 5 correspond to batch size and sequence length, respectively. I don’t understand why ‘x’ has the second dimension full of zeros below:

tensor([[[-4.3775e-01,  0.0000e+00],
         [-4.7356e-01,  0.0000e+00],
         [-4.9494e-01,  0.0000e+00],
         [-5.2778e-01,  0.0000e+00],
         [-5.5412e-01,  0.0000e+00]],
         ...
        [[ 2.7826e+00,  0.0000e+00],
         [ 2.7535e+00,  0.0000e+00],
         [ 2.7076e+00,  0.0000e+00],
         [ 2.6636e+00,  0.0000e+00],
         [ 2.6562e+00,  0.0000e+00]]]

n_feature should be the word embedding vector size(i.e. the length of each word represented as a vector)

I am not using any Embedding Layer, and my model is not an NLP model.

I apologise. I don’t know how to help you.

Thank you for your effort anyway!

Is there anyone to help?

The x tensor in your example is the input, so I assume your input just contains zeros in one dimension?
If I use a randomly initialized input, neither hidden nor out contains all zeros.

I’m sorry for the confusion. The second column is the label column which consists of zeros and ones. In the debugged part (until the error), all labels are 0 by a coincidence, therefore I thought there was a mistake. I don’t know why but I am not able to edit my first message.

Concisely, this tensor seems okay but the problem is still the error below:

RuntimeError: input.size(-1) must be equal to input_size. Expected 1, got 2

My training data contains only one feature column, therefore I pass “n_features” = 1. Labels can only be 0 or 1, hence this is binary classification, I pass “n_classes” = 1. With this setup, I get the error above.

If and only if I pass “n_features” = 2 and “n_classes” = 2, the model works.

I would be grateful if you help me on this. Thanks @ptrblck !

1 Like

This this error raised by the model in the forward pass or later by the loss function?
Could you post the input and target shapes as an executable code snippet to reproduce the issue?

The error is being raised in the forward pass, directly with the following line:

_, (hidden, _) = self.lstm(x)

I don’t know how to build an executable code snippet to reproduce the issue here, but I trimmed all unnecessary parts in the Colab link below. The error is present there and if you want, you can quickly run it again (I gathered into 2 cells):

Google Colab model.ipnyb

Thank you for the effort!

Hi, I looked at your code.

The error is:

model = GamestagePredictor(n_features = 1, n_classes = 1)
you are giving n_features = 1 and this n_features is being called by an LSTM layer

self.lstm = nn.LSTM(
        input_size  = n_features,
        hidden_size = n_hidden,
        num_layers  = n_layers,
        batch_first = True,
        dropout     = 0.75
    )                

The input_size argument of an LSTM layer should be a 3-D tensor not an “integer”
like:
(batch_size, seq_len, dim)

Thank you @Tejan_Mehndiratta
According to the documentation, “input_size” should be an integer. As I stated above, if I set “n_features” = 2, it works without a problem. However, I think that I should be able to set it to 1, since my training data has only one feature column. Setting it 1 causes the error.

1 Like

Oh! My bad. I got confused. Sorry.

Thanks for the code. I’ve removed all unnecessary code, since the forward pass should raise the issue, which works fine on my setup:

class ModuleLSTM(nn.Module):
  
  def __init__(self, n_features, n_classes, n_hidden=256, n_layers=3):        
    super().__init__()
    self.lstm = nn.LSTM(
        input_size  = n_features,
        hidden_size = n_hidden,
        num_layers  = n_layers,
        batch_first = True,
        dropout     = 0.75
    )                
    
    self.classifier = nn.Linear(n_hidden, n_classes)  
    
  def forward(self, x):
    self.lstm.flatten_parameters()
    _, (hidden, _) = self.lstm(x)
    out = hidden[-1]
    return self.classifier(out)


batch_size = 10
seq_len = 50
nb_features = 1
model = ModuleLSTM(nb_features, n_classes=10)
x = torch.randn(batch_size, seq_len, nb_features)
out = model(x)
print(out)

Could you check this minimal code snippet and compare it to the input shapes you are using?

1 Like

Thanks. I will try but why do we need to set “n_classes” to 10? Shouldn’t it be 1 in my case?

I just picked a random number, as the number of output features shouldn’t make a difference.
Also, since my small code snippet is running fine, please feel free to add any code snippets to it, which would reproduce your issue.

As far as I understood, the cause of the problem as follows:

batch_size = 64
seq_len = 5
n_features = 1
n_class = 1
model = ModuleLSTM(n_features, n_class)

If I feed the setup above with the ‘x’ below, there is no problem as you’ve said:

x = torch.randn(batch_size, seq_len, n_features)

BUT the problem is, in the same setup above, my real data comes into ‘x’ with shape (64, 5, 2), not with (64, 5, 1). This is because of the way I generate the data I guess, but I don’t know how to fix it. Let me explain you step by step:

1-> I create X_train, y_train, X_val, y_val, X_test, y_test, I scale the X… parts. Nothing unusual. In the rest, I will just mention the training data. I merge the scaled X_train with y_train:

trainData = pd.concat([X_train_scaled, y_train],axis=1)

2-> I create my data sequences with the function below. It fetches the first “seq_len” amount of feature rows, saves into “sequence”, fetches the “seq_len+1”-th label and appends this sequence and label together to “sequences” list. Then moves into the next sequence-label pair (till the end)

def create_sequences(input_data, target_name, sequence_length):
  sequences = []
  for i in range(0, len(input_data) - sequence_length, sequence_length):
    sequence = input_data[i : i + sequence_length]
    label    = input_data.iloc[i + sequence_length][target_name]
    sequences.append((sequence, label))
  return sequences

trainSequences = create_sequences(trainData, 'gamestageEMA', seq_len)  

3-> I pass the sequences to my “DataModule” class which inherits the pl.LightningDataModule. (You can see this class and the rest in my Colab)

dataModule = DataModule(trainSequences, valSequences, testSequences, BATCH_SIZE)

it first passes “trainSequences” to my “DoomFrameDataset” class which inherits torch Dataset and as a result, it’s “self.trainDataset” is filled. Then it returns Train DataLoader in the following way:

DataLoader(self.trainDataset, batch_size=self.batchSize, shuffle=False, num_workers=cpu_count())

4-> I create the model:

model = GamestagePredictor(n_features = 1, n_classes = 1)

and you know the rest. My GameStagePredictor class fetches the sequence and label pairs and passes them to LSTM:

def training_step(self, batch, batch_idx):               
    sequences, labels = batch["sequence"], batch["label"]
    loss, outputs = self(sequences, labels)
    ...

I am sorry for this long message, I just wanted to be more specific. Concisely, the problem is about the way I pass my sequence and label pairs to my model. Since set n_features = 1 and n_classes = 1, it wants to see the input with shape (64, 5, 1), but receives (64, 5, 2) instead. I don’t know how to deal with this.

I’m unfortunately not familiar with Lightning’s DataModule and don’t know if any reshaping is done internally.
Since the shape is apparently wrong for the input, I would recommend to add print statements in all data loading classes and check the current shape of the input to further isolate the additional values.

Thank you for your time and effort @ptrblck , I am appreciated.

I realized that the problem was being caused by the sequence creation function. I solved the issue there. Now, if I pass n_features=1 and class=2 (I was thinking that I should set this to 1 since this is binary classification, but setting 1 raises another error), the model runs without any problem.

The only problem now is that if I run the model on CPU, there is no problem, however, if I try to use GPU, I get:

RuntimeError: CUDA error: device-side assert triggered

Often device assertions are trigged by e.g. invalid indices. You could run the script via CUDA_LAUNCH_BLOCKING=1 python script.pt args and check the stacktrace for the failing operation.
Once you know which line of code is raising this error, you can add print statements to debug the issue further.

1 Like