About the relation between batch_size and length of data_loader

ptrblck · November 28, 2017, 12:48pm

The length of the loader will adapt to the batch_size. So if your train dataset has 1000 samples and you use a batch_size of 10, the loader will have the length 100.
Note that the last batch given from your loader can be smaller than the actual batch_size, if the dataset size is not evenly dividable by the batch_size. E.g. for 1001 samples, batch_size of 10, train_loader will have len(train_loader)=101 and the last batch will only contain 1 sample. You can avoid this by setting drop_last=True.

class MyDataset(Dataset):
    def __init__(self, size):
        self.x = torch.randn(size, 1)
    
    def __getitem__(self, index):
        return self.x[index]

    def __len__(self):
        return len(self.x)

dataset = MyDataset(1001)

data_loader = DataLoader(dataset,
                         batch_size=10)

len(data_loader)

for batch_idx, data in enumerate(data_loader):
    print 'batch idx{}, batch len {}'.format(
        batch_idx, len(data))

data_loader = DataLoader(dataset,
                     batch_size=10,
                     drop_last=True)

len(data_loader)

for batch_idx, data in enumerate(data_loader):
    print 'batch idx{}, batch len {}'.format(
        batch_idx, len(data))