What does batch_size argument in PyTorch mean?

For example, If I have a dataset with 10 rows. I want to train an MLP/RNN/CNN on this using mini batches.

So, let’s say, I take 2 rows at a time to train. 2 x 5 = 10. So, I train my model with batches where each batch contains 2 rows. So, number of batches = 5 and number of rows per batch is 2.

Is my batch_size 2? or is it 5?

In the dataloader, should my batch_size = 2 or batch_size = 5?

2 Likes

In your example, batch_size=2.
Once you give the batch_size, Dataloader will take care of splitting the dataset into batches (5 in your case) and provide it to you.

1 Like

Thank you for clarifying! :slight_smile:

The same definition of batch_size holds in case of an RNN as well?

Yes. The same definition of batch_size applies to the RNN as well.
But the addition of time steps might make things a bit tricky (RNNs take input as batch x time x dim as input, assuming all the data instances in the batch are padded to have same number of time steps).

Also, take care of batch_first=True/False option in RNNs.