[SOLVED] Need help with custom dataloader, batch size returns incorrect batch size and num_worker > 0 hangs jupyter notebook

inkplay · May 13, 2018, 10:14pm

EDITED: Everything fails because of this line that I kept ignoring. ‘self.image_paths’ is a string D:/lineartPSDs/A/, Guess how long it is? OPS. Hey look on the bright side I am more experienced with data loader now.

def __len__(self):
       return len(self.image_paths)

I have a tiny dataset of 81 images with 81 styles that I have separated into 2 folders. Every time when the data is grabbed both sets of images is randomly cropped the same way. I have already checked and made sure everything is working. My issue is when I tried to batch them together using the data loader.

portrait_dataset = MyDataset(csv_file = csv_file,
                             portrait_paths=portrait_dir, 
                             lineart_paths=lineart_dir)

for i in range(len(portrait_dataset)):
    sample = portrait_dataset[i]

    print(i, sample['portrait'].size(), sample['lineart'].size())

    if i == 3:
        break

What the above prints out is shown below. If I didn’t break at 3 then it will print out all 81 pairs.

0 torch.Size([3, 256, 256]) torch.Size([3, 256, 256])
1 torch.Size([3, 256, 256]) torch.Size([3, 256, 256])
2 torch.Size([3, 256, 256]) torch.Size([3, 256, 256])
3 torch.Size([3, 256, 256]) torch.Size([3, 256, 256])

The below fails the following forloop if I set num_workers to be bigger than 0.

dataloader = DataLoader(portrait_dataset, batch_size=8,
                        shuffle=False, num_workers=0)

for i_batch, sample_batched in enumerate(dataloader):
    print(i_batch, sample_batched.keys(), 'PORTRAIT:', sample_batched['portrait'].size(), 'LINEART:', sample_batched['lineart'].size() )

0 dict_keys(['portrait', 'lineart']) PORTRAIT: torch.Size([2, 3, 256, 256]) LINEART: torch.Size([2, 3, 256, 256])
1 dict_keys(['portrait', 'lineart']) PORTRAIT: torch.Size([2, 3, 256, 256]) LINEART: torch.Size([2, 3, 256, 256])
2 dict_keys(['portrait', 'lineart']) PORTRAIT: torch.Size([2, 3, 256, 256]) LINEART: torch.Size([2, 3, 256, 256])
3 dict_keys(['portrait', 'lineart']) PORTRAIT: torch.Size([2, 3, 256, 256]) LINEART: torch.Size([2, 3, 256, 256])
4 dict_keys(['portrait', 'lineart']) PORTRAIT: torch.Size([2, 3, 256, 256]) LINEART: torch.Size([2, 3, 256, 256])
5 dict_keys(['portrait', 'lineart']) PORTRAIT: torch.Size([2, 3, 256, 256]) LINEART: torch.Size([2, 3, 256, 256])
6 dict_keys(['portrait', 'lineart']) PORTRAIT: torch.Size([2, 3, 256, 256]) LINEART: torch.Size([2, 3, 256, 256])
7 dict_keys(['portrait', 'lineart']) PORTRAIT: torch.Size([2, 3, 256, 256]) LINEART: torch.Size([2, 3, 256, 256])
8 dict_keys(['portrait', 'lineart']) PORTRAIT: torch.Size([1, 3, 256, 256]) LINEART: torch.Size([1, 3, 256, 256])

The above example uses 8 batches.
Batch_size = 1 would give me back only 17 total pairs.

What happened to the rest of the data? Also if I set num_worker bigger than 0 the entire forloop would hang indefinitely. Can someone tell me what did I do wrong? I have posted the rest of the code below!

import os, random
import torch
import pandas as pd
import numpy as np
import torchvision.transforms.functional as TF
import matplotlib.pyplot as plt
from skimage import io, transform
from torch.utils.data import DataLoader, Dataset
from torchvision import transforms, utils
from PIL import Image

lineart_dir = 'D:/...' # Contains 81 styles
portrait_dir= 'D:/...'  # Contains 81 portraits
csv_file='D:/.../data_set.csv' # A csv file contains the list of file names, column 0 = Portrait, 1 = LineArt

class MyDataset(Dataset):
    def __init__(self, csv_file, portrait_paths, lineart_paths, train=True):
        self.image_paths = portrait_paths
        self.target_paths = lineart_paths
        self.image_names = pd.read_csv(csv_file)

    def transform(self, image, mask):
        
        
        # RandomResizeCrop
        i, j, h, w = transforms.RandomResizedCrop.get_params(image, scale=(0.08, 1.0), ratio=(0.75, 1.3333333333333333))
        
        image = TF.crop(image, i, j, h, w) # same crop for both images
        mask = TF.crop(mask, i, j, h, w)
        
        #Resize
        resize = transforms.Resize(size=(256, 256))
        image = resize(image)
        mask = resize(mask)
        

         # Random horizontal flipping
        if random.random() > 0.5:
            image = TF.hflip(image)
            mask = TF.hflip(mask)
            
        # Random vertical flipping
#         if random.random() > 0.5:
#             image = TF.vflip(image)
#             mask = TF.vflip(mask)

        # Transform to tensor
        image = TF.to_tensor(image)
        mask = TF.to_tensor(mask)
        return image, mask
        

    def __getitem__(self, idx):
        #os.path.join( self.portrait_dir, self.image_names.iloc[ idx, 0 ] )
        image = Image.open( os.path.join( self.image_paths, self.image_names.iloc[ idx, 0 ] ) )
        mask = Image.open( os.path.join( self.target_paths, self.image_names.iloc[ idx, 1 ] ) )
        
        x, y = self.transform(image, mask)
        
        sample = { 'portrait':x, 'lineart':y }
        return sample

    def __len__(self):
        return len(self.image_paths)
    
portrait_dataset = MyDataset(csv_file = csv_file,
                             portrait_paths=portrait_dir, 
                             lineart_paths=lineart_dir)

dataloader = DataLoader(portrait_dataset, batch_size=8,
                        shuffle=False, num_workers=0)

Rohan_Bareja · October 6, 2023, 6:52pm

Were you able to solve this issue? I am also having similar issue.