Batch size is always 1

mhong94 · July 13, 2020, 4:05pm

No matter what I put for batch_size, the batch_size defaults to 1. Here is my code

train_dataset = DataLoader(dataset=dataset,
                       batch_size=4,
                       shuffle=True,
                       num_workers=0)

and dataset is a custom dataset as follows

class ImageDataset(data.Dataset):

def __init__(self, root_dir, num_augments=2, transform=None):
    
    self.root_dir = root_dir
    self.img_names = os.listdir(root_dir)[::600]
    self.num_augments = num_augments
    self.transform = transform
    
def __getitem__(self, index):
    
    output = []
    img = Image.open(self.root_dir + '/' + self.img_names[index]).convert('RGB')
        
    for i in range(self.num_augments):
        if self.transform is not None:
            img_transform = self.transform(img)
            
        output.append(img_transform)
        
    output = torch.stack(output, axis=0)
        
    return output
        
def __len__(self):
    
    return len(self.img_names)

I am calling my dataset in the form

for i,images in enumerate(train_dataset):

I am expecting images to be of size [batch_size, num_augments, 3, height, width], but I am getting [1, num_augments, 3, height, width] regardless of my batch size.

albanD · July 13, 2020, 4:06pm

Hi,

Are you sure you’re iterating over the dataloader and not the dataset?

mhong94 · July 13, 2020, 4:21pm

Yep. The dataset and dataloader code looks as follows:

dataset = ImageDataset(root_dir = image_dir,
                   num_augments = num_augments,
                   transform = augment_image())

train_dataset = DataLoader(dataset=dataset,
                       batch_size=batch_size,
                       shuffle=True,
                       num_workers=0)

and then I am iterating through train_dataset

albanD · July 13, 2020, 4:36pm

I can’t reproduce that…
Can you share a full example (maybe with a TensorDataset with random data) that reproduces this?

mhong94 · July 13, 2020, 4:56pm

Here is a running piece of code

class ImageDataset(data.Dataset):

def __init__(self, root_dir, num_augments=2, transform=None):
    
    self.root_dir = root_dir
    self.img_names = os.listdir(root_dir)[::600]
    self.num_augments = num_augments
    self.transform = transform
    
def __getitem__(self, index):
    
    output = []
    
    for i in range(self.num_augments):
        img_transform = torch.randn(3,224,224)
        
        output.append(img_transform)
        
    output = torch.stack(output, axis=0)
        
    return output
        
def __len__(self):
    
    return len(self.img_names)

dataset = ImageDataset(root_dir = '',
                   num_augments = 10,
                   transform = None)

train_dataset = DataLoader(dataset=dataset,
                       batch_size=5,
                       shuffle=True,
                       num_workers=0)

for i,images in enumerate(train_dataset):
    test_shape = images.shape

I believe I should be getting [5,10,3,224,224], but getting [1,10,3,224,224]

albanD · July 13, 2020, 6:17pm

And what is the size of the dataset? Doesn’t it contain a single element?

mhong94 · July 13, 2020, 6:34pm

Ahh I realized I made a dummy folder with a small dataset and forgot to swap the folder names again. Thanks!