Dataloader adds batch number to torch.shape

I am using to load my data in batches. It loads all data fine. But the batch number is the first parameter in the shape of the torch variable. So when I use GRU or LSTM model they complain that the torch dataset has an extra dimension. Is there a way to avoid this problem? For example my dataset is of shape [4, 1, 6]. But dataloader returns shape as [1, 4, 1, 6] which LSTM or GRU do not like.

Even if say batch_size = 1 it still adds a 1 to the shape like [1, 4, 1, 6].

How can I get the size of my dataset to be [4, 1, 6] instead of [1, 4, 1, 6]?

  1. In the dataset that you are passing into DataLoader, can you check what the output of its __getitem__ is (or __iter__ for Iterable-style dataset)? Specifically, check the dimension of that output.

  2. If the dimension of the above is correct, can you check if auto collation is being applied to your output? Is that step changing the dimension?

set batch_size=None in the DataLoader

If I set batch_size to None will it read the entire dataset defeating the purpose of splitting the training/validating dataset into batches?

If all you need is to get data of size [4,1,6] from the [1, 4, 1, 6] sized data, you can write data = torch.squeeze(0) (or) data = data[0] after getting the data.

Actually, setting batch_size = None worked. Thanks David.

1 Like