Dataloader adds batch number to torch.shape

mmandayam · April 13, 2022, 12:50am

I am using torch.utils.data.DataLoader to load my data in batches. It loads all data fine. But the batch number is the first parameter in the shape of the torch variable. So when I use GRU or LSTM model they complain that the torch dataset has an extra dimension. Is there a way to avoid this problem? For example my dataset is of shape [4, 1, 6]. But dataloader returns shape as [1, 4, 1, 6] which LSTM or GRU do not like.

Even if say batch_size = 1 it still adds a 1 to the shape like [1, 4, 1, 6].

How can I get the size of my dataset to be [4, 1, 6] instead of [1, 4, 1, 6]?

nivek · April 13, 2022, 2:56am

In the dataset that you are passing into DataLoader, can you check what the output of its __getitem__ is (or __iter__ for Iterable-style dataset)? Specifically, check the dimension of that output.
If the dimension of the above is correct, can you check if auto collation is being applied to your output? Is that step changing the dimension?
- Within the DataLoader, the collate_fn is applied to your batch (by default, default_collate) as described by the documentation here. You can choose to override that.

huahuanZ · April 13, 2022, 7:45am

set batch_size=None in the DataLoader

mmandayam · April 16, 2022, 7:43am

If I set batch_size to None will it read the entire dataset defeating the purpose of splitting the training/validating dataset into batches?

InnovArul · April 16, 2022, 7:49am

If all you need is to get data of size [4,1,6] from the [1, 4, 1, 6] sized data, you can write data = torch.squeeze(0) (or) data = data[0] after getting the data.

mmandayam · April 16, 2022, 2:28pm

Actually, setting batch_size = None worked. Thanks David.