DataLoader doesn't add batch size

realtm · October 8, 2019, 8:35pm

Hello,

i wrote my own Dataset and tried to put it in DataLoader. All seems to work, but the loaded data doesn’t get the batch size.

I have a 3x64x64 RGB image and a 1x64x64 grayscale image and concatenate them in my Dataset to get a 4x64x64. After using the Dataloader the output should have the shape 64x4x64x64 (batchsize=64) but it still has 4x64x64. Any suggestions or ideas?

class MyDataset(Dataset):
    def __init__(self, path_grain, path_mask, transform=None):
        self.data_grain = dset.ImageFolder(
            root=path_grain, transform=transforms.Compose([transforms.ToTensor(), 
                                                         transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),]))
        self.data_mask = dset.ImageFolder(root=path_mask, 
                                 transform=transforms.Compose([transforms.Grayscale(num_output_channels=1), 
                                 transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]))
        
    def __getitem__(self, index):
        x_grain, y = self.data_grain[index]
        x_mask, _ = self.data_mask[index]
        x = torch.cat((x_grain, x_mask), dim=0)
        return x
    
    def __len__(self):
        return len(self.data_grain)




dataset = MyDataset("..\data512to64_grain", "..\data512to64_mask")

dataloader = torch.utils.data.DataLoader(dataset, batch_size=opt.batchSize,
                                            shuffle=True, num_workers=int(opt.workers))

Greetings:)

ptrblck · October 8, 2019, 9:41pm

Your code works fine using dummy data:

class MyDataset(Dataset):
    def __init__(self):
        self.data_grain = torch.randn(100, 3, 64, 64)
        self.data_mask = torch.randn(100, 1, 64, 64)
        
    def __getitem__(self, index):
        x_grain = self.data_grain[index]
        x_mask = self.data_mask[index]
        x = torch.cat((x_grain, x_mask), dim=0)
        return x
    
    def __len__(self):
        return len(self.data_grain)

dataset = MyDataset()

dataloader = torch.utils.data.DataLoader(
    dataset, batch_size=64, shuffle=True, num_workers=0)

data = next(iter(dataloader))
print(data.shape)

Could you check the opt.batchSize again?
Also, could you print the shape of the output of dataset.data_grain[0] as well as dataset.data_mask[0]?

realtm · October 10, 2019, 8:47pm

Heres the output.

len(dataset.data_grain[0]): 2
len(dataset.data_mask[0]): 2
(dataset.data_grain[0][0]).shape
torch.Size([3, 64, 64])
(dataset.data_mask[0][0]).shape
torch.Size([1, 64, 64])
batchsize: 064

dataset.data_grain[0] is a tuple

I took a look at another example of you, and there was a returning y in the get_item function.
I put it in as well, and the code worked correct after…

def __getitem__(self, index):
        x_grain, y = self.data_grain[index]
        x_mask, _ = self.data_mask[index]
        x = torch.cat((x_grain, x_mask), dim=0)
        return x, y