Hi! I need help on DataLoader to resolve what is probably a simple misconception. Below, I have a MWE illustrating my use case. I input a list of three arrays of size 1024 to a custom
Dataset, which will return element
idx from each array as a sequence of three elements. I have checked that
Dataset by iterating through it and confirmed that I could retrieve elements 0 through 1023. When using a DataLoader with
batch_size=100, I expect the
DataLoader to return batches of size 100. However, I only get one batch of size 1. I thought I understood how the DataLoader was supposed to work, but obviously not. Any insight is appreciated. Thanks.
import numpy as np import torch from torch.utils.data import Dataset, DataLoader # Create 3 arrays of size 1024 a = np.random.randn(3, 1024) print(a.shape) # 3,1024 a1, a2, a3 = a, a, a print(a1.shape) # 1024 class myDataset(Dataset): """ Parameters: ---------- A list of numpy arrays """ def __init__(self, data): assert isinstance(data, list), "myDataset: argument must be of type list" self.data = data def __getitem__(self, idx): return tuple(data[idx] for data in self.data) def __len__(self): return len(self.data) data = myDataset([a1, a2, a3]) data_iter = DataLoader(data, batch_size=100, shuffle=False) for index, values in enumerate(data_iter): print("index= ", index) print("values= ", values) # output of the for loop: # index= 0 # values= [tensor([-0.4421, -0.4562, 1.2012], dtype=torch.float64), tensor([-0.8228, -0.7304, # 0.6380], dtype=torch.float64), tensor([ 1.2241, 0.4840, -0.0031], dtype=torch.float64)] # I expected to collect about 10 batches of size 100. # However, I only collect the equivalent of a batch of size 1, and the for loop has only a single iteration.