Trouble in making dataloaders

887574002 · June 28, 2020, 4:39pm

Hi, I am trying to make data loaders from a medical image dataset. I wrote a custom dataset class and I passed the transformations. for each image in the dataset, I want to have two augmented version, that’s why I use an auxiliary class where it returns two augmented version of each image. The data set is working correctly. But when I try to make data loaders using DataLoaders in PyTorch it returns two images. I mean each batch includes two images. It seems it can not batch the images that it has already made. Here is my code. I would be thankful if you help me to correct this code.

    
    def __init__(self, transform):
        
        self.transform = transform

    def __call__(self, vox):
        
        voxi = self.transform(vox)
        voxj = self.transform(vox)
        
        return(voxi,voxj)

Here is the main custom dataset

class NPDataSet(Dataset):
    
    
    def __init__(self,root_dir, transform=None):
        
        self.root_dir = root_dir
        
        VolList = sorted([vocname for vocname in os.listdir(root_dir)])
        
        # path to each volume
        self.pathvol = sorted([os.path.join(self.root_dir,vol) for vol in VolList])
        
        self.transform = transform
        
        self.len = len(VolList)
        
        
    def __len__(self):
        
        return(self.len)
    
    
    def __getitem__(self,index):
        
         the shape is in the form (depth,highet,width,channel)
        
        self.vol = np.load(self.pathvol[index])
        print(self.pathvol[index])
        
         checking to see if it is in the form (channel,deoth,hight,width)
        
        assert len(self.vol.shape)==4
        
         Now we transpose the axis if the order is in the shape (depth,hight,width,channel)
        
        if self.vol.shape[-1] ==1:
            
            self.vol = np.transpose(self.vol,(3,0,1,2))
        
         applying transformations

        if self.transform:
            
            aug  = AguPairVox(self.transform)
            
            self.vol = aug(self.vol)
  
        return(self.vol)

making data loader

train_loader = DataLoader(train_set, batch_size=15, drop_last=False, shuffle=False)

Now when checking the length of batch

for batch in train_loader:
    
    print(len(batch))
    
    break

2

ptrblck · June 29, 2020, 9:36am

I assume self.vol is containing both images?
If so, try to return both tensors separately as:

return self.vol[0], self.vol[1]

and rerun the script again.

887574002 · June 29, 2020, 6:50pm

Hi @ptrblck, thank you. I did it and it still returns a batch of size 2!

harsha_g · June 29, 2020, 10:02pm

@887574002 I am trusting you verified the elements of the batch to be images. But, just in case, is it possible that the batch returns a tuple (of size 2) where each element is a tensor of shape batch_size x shape_of_transformed_image?

887574002 · June 30, 2020, 8:59am

Hi, sorry. That’s the mistake that I made. You are right. Each batch is a tuple that each element of this tuple is a tensor of shape [batch_size, h,w,d]. Wrongly, I thought the length of batch should be the same as the batch_size that we pass to the data loader. Thanks for the point.