The problem: I have images that I’ve loaded and then stored to numpy arrays. The dataset is quite big so I realized I have to split it into different files where I can load one at a time. I’ve tried to create my own dataset class as follows

```
class my_Dataset(Dataset):
# Characterizes a dataset for PyTorch
def __init__(self, folder_dataset, transform=None):
# xs, ys will be name of the files of the data
self.xs = []
self.ys = []
self.transform = transform
# Open and load text file including the whole training data
with open(folder_dataset + 'data.txt') as f:
for line in f:
self.xs.append(folder_dataset+line.split()[0])
self.xs.append(folder_dataset + line.split()[1])
# pick a random of these (sub)files of the dataset
file_ID = np.random.randint(1, len(self.xs))
numpy_data = np.load('x_imgs_ID_' + str(file_ID) + '.npy')
numpy_data = np.moveaxis(numpy_data, [3], [1])
numpy_target = np.load('y_imgs_ID' + str(file_ID) + '.npy')
# make numpy arrays to tensors
self.data = torch.from_numpy(numpy_data).float()
self.target = torch.from_numpy(numpy_target).long()
def __len__(self):
return len(self.data)
def __getitem__(self, index):
# Generates one sample of data, done with pytorch's own index?
single_x = self.data[index]
single_y = self.target[index]
return single_x,single_y
```

I’m new to PyTorch and deeplearning in general so I’m trying to learn. Am I doing it correctly in this dataset class? Specifically the np.random part for me feels super weird. I’ve read a bunch of posts and tutorials on trying to understand how this works but this still feels difficult for me to grasp.

Anyone that can give me some feedback on this is very much appreciated!

/Dino