I want features from every iteration have the size of (32, 390).
but my custom dataloader gave me full size of data
Hello,
The problem that you have is a MemoryError since you continue to use your whole dataset in your __getitem__
method. In __getitem__
, what you need to do is return 1 example from your dataset (so 1 feature tensor and 1 output tensor in your case). The PyTorch DataLoader
will take care of giving you a batch tensor (it will concatenate batch_size
tensors from your __getitem__
function). With that in mind, here is a small example of how to adapt your code so you don’t get the error:
class CustomDataset(torch.utils.data.Dataset):
def __init__(self, data):
self.data = data
def __getitem__(self, idx):
# do whatever you do with your columns.
output_col = 4
# only take the index of the data that you are interested in.
output = self.data[idx, output_col]
# arbitrary number on my part, use your columns!
normalized_data = self.data[idx, 10:12]
one_hot_encoded_data = self.data[idx, 90:100]
features = np.concatenate([normalized_data, one_hot_encoded_data])
features = torch.from_numpy(features)
return features, output
def __len__(self):
return len(self.data)
data = np.random.rand(838682, 390)
dataset = CustomDataset(data)
loader = torch.utils.data.DataLoader(dataset, batch_size=32)
for feature, label in loader:
print(feature.shape)
# torch.Size([32, 12])
Hope it helps!
1 Like
Thank you! It solves my problem:)