Need help racking my brain on batch_size

I am trying to workout some inputs and outputs for an LSTM model and I want to make sure I am understanding a few things, because I am having a hard time troubleshooting.

Is “batch_size” as it pertains to the LSTM model, the same “batch_size” we talk about with DataLoader?


I have a set of data that is 2000 rows and 30495 columns. I have a Dataset that hands these out as 28 rows x 30495 columns at a time (2 weeks data, each row is a day). Basically my dataset is just “rolling” forward to give out the next 28 days each time. But the end result is the Dataloader is receiving data of 28x30495. I have the Dataloader set to “batch_size” of 20.

So with LSTM there is some terminology I am trying to make sure is straight. The actual data being passed into the model is data torch.Size([20, 28, 30495]). I set “batch_first=True”, since I am assuming that my batch_size from my Dataloader of 20, is the same “batch_size” LSTM is referring to and since that is first in my object, I set it.

Now in my example, input_size is 30495 correct? What about seq_length, would that be 28 (28 rows in each object)?

I appreciate any help on this, I think once I can get some of these things straight I can make progress.

Here is some more info on my problem. I am trying to get the output of an LSTM to pass into a Linear layer. I can’t get it to work for the life of me, and that is why I am trying to make sure I have my sizes straight in my first post above.

Here is the model:

class LSTM(nn.Module):

    def __init__(self, num_classes, input_size, hidden_size, num_layers, batch_size):
        super(LSTM, self).__init__()
        self.num_classes = num_classes
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.batch_size = batch_size
        self.dropout = nn.Dropout(p=0.2)
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size,
                            num_layers=num_layers, batch_first=True, dropout = 0.25)

        self.fc = nn.Linear(hidden_size, num_classes)

    def init_hidden(self):
        self.h_0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_size).to(device)
        self.c_0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_size).to(device)     
    def forward(self, x):                                             # x torch.Size([20, 28, 30495])
        lstm_output, (h_n, c_n) = self.lstm(x, (self.h_0, self.c_0))  
                                                                               # lstm_output torch.Size([20, 28, 512])
                                                                              # h_n torch.Size([1, 20, 512])
        lstm_output = torch.flatten(lstm_output, start_dim=1)
        out = self.fc(lstm_output)
        out = self.dropout(out)
        return out

Originally I got an error that lstm_output was the wrong size

RuntimeError: The size of tensor a (28) must match the size of tensor b (20) at non-singleton dimension 1

So that is why I used flatten. But with the code the way it is above, it gives another error:

RuntimeError: size mismatch, m1: [20 x 14336], m2: [512 x 30490] at /tmp/pip-req-build-8whce6xx/aten/src/THC/generic/

If I send h_n into the linear layer instead of lstm_output, it works fine, in fact I can just use view to flatten h_n and that works. But I don’t know that I want to send h_n. It would seem the proper object to send is lstm_output.

Obviously I have a mismatch or something somewhere. If I set batch_size to 28, everything works with sending lstm_output into the linear layer and I don’t even have to use flatten. But thats why I think I have something mixed up because I should be able to set batch_size to whatever I want correct?

Well, as can be seen here, the variable lstm_output is of size torch.Size([20, 28, hidden_size]), i.e. you get the hidden activations of every step of the sequence. So, when you flatten the variable, you should get a tensor of size torch.Size([20, 28*hidden_size]). Judging by the dimensions of the output layer in your runtime error (in_features==512), 28*512 == 14336, which is what you are getting when you flatten the variable.

Yes its 20x14336. My dimensions just aren’t correct each step of the way, I am doing something wrong. here is a better flow:

My data is rows of 28 x 30495
Dataloader grabs batch_size 20, so 20x28x30495 and throws that to my lstm

And here is how my objects are changing

for data, target in trainX:
        data, target =,
        # data torch.Size([20, 28, 30495])
        # target torch.Size([20, 30490])

        outputs = lstm(data)

Inside my model

    def forward(self, x):                                             
        # x torch.Size([20, 28, 30495])
        # Propagate input through LSTM
        lstm_output, (h_n, c_n) = self.lstm(x, (self.h_0, self.c_0))  
        # lstm_output torch.Size([20, 28, 512])
        # h_n torch.Size([1, 20, 512])
        lstm_output = torch.flatten(lstm_output, start_dim=0, end_dim=1)
        # lstm_output  torch.Size([560, 512])
        out = self.fc(lstm_output)
        # out  torch.Size([560, 30490])
        out = self.dropout(out)
        # out torch.Size([560, 30490])        
        return out

The above fails with
RuntimeError: The size of tensor a (560) must match the size of tensor b (20) at non-singleton dimension 0

Because my loss function wants the output to be in 20x30490. So I worry I am somehow getting 28 (the number of rows in a single batch) confused with 20 (the number of batches). And I don’t want to just do reshape/flatten gymnastics to make something fit unless I am doing it the right way. I fear I am doing something fundamentally wrong that is tripping everything up but I don’t see it.

Should I be sending h_n or lstm_output to my linear layer? Perhaps that is where I am making the mistake. If I am to send h_n then when is lstm_output used or sent anywhere? This is a many to many LSTM. I am taking in 20x28x30495 and outputting 20x28x30490.


I made this up. See if this helps. if not, please feel to ignore or flag it for deletion.

# I assumed many-to-many classification
# num_classes = 50
# input_size = 30495
# hidden_size = 128
# batch_size = 20
# num_layers = 2

# model = LSTM(num_classes, input_size, hidden_size, num_layers, batch_size)

# Input to LSTM = (20, 28, 30495) (batch, seq_len, input_size)
# lstm_output shape = (20, 28, 128) (batch, seq_len, num_directions * hidden_size)
# there's no need to flatten the lstm_output shape
# Let's feed it as is to the Feed Forward layer -->  nn.Linear(hidden_size, num_classes)
# hidden_size is H_in and num_classes is H_out (when you refer to documentation for nn.Linear)
# input to nn.Linear = (20, 28, 128) (N, ∗, H_in)
# output will be = (20, 28, 50) (N, *, H_out)

# Now, try to make sense of this output shaped (20, 28, 50): 
# You have a batch of 20 samples.
# Each sample is a sequence of length 28 (corresponding to each day of the 2 or 4 weeks of whatever)
# For each day you had 30495 features (initially) and now you ended up with 50 features.
# This number 50 corresponds to the number of classes (remember, H_out)
# So, when you feed this to CrossEntropyLoss (for multiclass classification)
# Note you want to calculate 28 losses. One for each time step.
# The input shape to the CELoss for one timestep will be (20, 50) (N, C)
# See documentation for nn.CrossEntropyLoss more info on that.
# How to calculate all the 28 losses? I don't know. My DL is good only that far. :)
# For more on that, I"ll refer you here

To the community at large, feel free to add to my response or correct it if there’s anything wrong. I am happy to learn. :slight_smile:

Your issue is that you are taking all the time steps in the series of hidden states without any aggregation. You should aggregate these outputs in a way that produces a tensor of size torch.size([20, hidden_size]), as you are trying to get a torch.size([20, 30495]) tensor as a result through a Linear layer. There are a couple of “mainstream” ways to achieve that:

>>> lstm_outputs = torch.zeros(20, 28, 512)
>>> lstm_outputs.mean(dim=1).size() # mean of hidden activations through all time steps
torch.Size([20, 512])
>>> lstm_outputs[:, -1].size() # index sequence dimension, get last hidden activation
torch.Size([20, 512])