Hi! I need help on DataLoader to resolve what is probably a simple misconception. Below, I have a MWE illustrating my use case. I input a list of three arrays of size 1024 to a custom Dataset
, which will return element idx
from each array as a sequence of three elements. I have checked that Dataset
by iterating through it and confirmed that I could retrieve elements 0 through 1023. When using a DataLoader with batch_size=100
, I expect the DataLoader
to return batches of size 100. However, I only get one batch of size 1. I thought I understood how the DataLoader was supposed to work, but obviously not. Any insight is appreciated. Thanks.
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
# Create 3 arrays of size 1024
a = np.random.randn(3, 1024)
print(a.shape) # 3,1024
a1, a2, a3 = a[0], a[1], a[2]
print(a1.shape) # 1024
class myDataset(Dataset):
"""
Parameters:
----------
A list of numpy arrays
"""
def __init__(self, data):
assert isinstance(data, list), "myDataset: argument must be of type list"
self.data = data
def __getitem__(self, idx):
return tuple(data[idx] for data in self.data)
def __len__(self):
return len(self.data)
data = myDataset([a1, a2, a3])
data_iter = DataLoader(data, batch_size=100, shuffle=False)
for index, values in enumerate(data_iter):
print("index= ", index)
print("values= ", values)
# output of the for loop:
# index= 0
# values= [tensor([-0.4421, -0.4562, 1.2012], dtype=torch.float64), tensor([-0.8228, -0.7304, # 0.6380], dtype=torch.float64), tensor([ 1.2241, 0.4840, -0.0031], dtype=torch.float64)]
# I expected to collect about 10 batches of size 100.
# However, I only collect the equivalent of a batch of size 1, and the for loop has only a single iteration.