Dataloader adding extra dimension, I don't know why

Abhishek_Gupta · May 31, 2023, 11:00am

Let me explain the secenry which i think is not important. I have created a tensor of images length of which is 30 so its shape is (30,3,448,448) where 3 is color channel and 448 is the pixel values.

But after passing it through DataLoader:
train_data = DataLoader( batch_size = 32, shuffle = False, dataset = train_data )

where x_train has the shape of mentions above.

After doing the above operation with DataLoader the shape of train_data becomes (30,30,3,448,448). Now my Question from where did that 30 came from as i was expecting (30,3,448,448)

Hope You all understand the problem

ptrblck · May 31, 2023, 6:57pm

I don’t fully understand the issue as I would expect to see an output batch from the DataLoader in the shape [batch_size=32, 30, 3, 448, 448] since the DataLoader will add the batch dimension containing batch_size samples.
Also, x_train is undefined so I assume you are passing it to the DataLoader instead of train_data?

Abhishek_Gupta · June 1, 2023, 4:42am

Yes my bad ! While posting by mistakenly wrote train_data instead of x_train

Why would it add batch_size in dim = 0, because i think dataloader works here to divide the data in size of batch_size so if the data has size n so it will divide data such each batch will have size (32,3,448,448). Here n = 30, so there will be only one batch of (30,3,448,448)

I know I am wrong, Please tell me wherever I am wrong

Thank You Sir for replying (@ptrblck )

ptrblck · June 1, 2023, 7:48am

Sorry I might have misunderstood your use case as I thought each sample has a shape of [30, 3, 448, 448], but it seems that’s the overall dataset.
In this case the code works fone for me and I see the expected shape:

train_data = torch.randn(30, 3, 448, 448)
loader = DataLoader(train_data, batch_size=32, shuffle=False)

for data in loader:
    print(data.shape)
# torch.Size([30, 3, 448, 448])