Dataloader does not return batch-size cut iterator. each data in iterator has full size as total data

I want features from every iteration have the size of (32, 390).
but my custom dataloader gave me full size of data


The problem that you have is a MemoryError since you continue to use your whole dataset in your __getitem__ method. In __getitem__, what you need to do is return 1 example from your dataset (so 1 feature tensor and 1 output tensor in your case). The PyTorch DataLoader will take care of giving you a batch tensor (it will concatenate batch_size tensors from your __getitem__ function). With that in mind, here is a small example of how to adapt your code so you don’t get the error:

class CustomDataset(
    def __init__(self, data): = data

    def __getitem__(self, idx):
        # do whatever you do with your columns.
        output_col = 4
        # only take the index of the data that you are interested in.
        output =[idx, output_col]

        # arbitrary number on my part, use your columns!
        normalized_data =[idx, 10:12]
        one_hot_encoded_data =[idx, 90:100]

        features = np.concatenate([normalized_data, one_hot_encoded_data])
        features = torch.from_numpy(features)

        return features, output

    def __len__(self):
        return len(

data = np.random.rand(838682, 390)
dataset = CustomDataset(data)
loader =, batch_size=32)

for feature, label in loader:
    # torch.Size([32, 12])

Hope it helps!

1 Like

Thank you! It solves my problem:)