Custom data loader for big data

Hi everyone,

I’m going to load a dataset that is about 16G in colab, but as you know, due to low RAM space it is not possible. The best solution (in my opinion!) would be to load data in batches (my data is located in my Google Drive). All in all, I cannot write a custom data loader to do it for me. (load data from my Google Drive and in specified batches). Can any one help me?
Note: My data are in .npy form

Thank you,

The best approach is to write a custom Dataset, something like:

from torch.utils.data import Dataset, DataLoader

# custom dataset
class MRIData(Dataset):
    def __init__(self, test=False):
        # List of all image files
        self.patients = os.listdir('./data')
        
    def __len__(self):
        return len(self.patients)
    
    def __getitem__(self, x):
        patient = self.patients[x]
        filename = './data/'+ patient
        image = np.load(filename, allow_pickle=True)

        images = image[0][0]
        images = torch.from_numpy(images)
        label = image[0][1].item()
        label = torch.tensor(abs(label), dtype=torch.float)
        
        return images, label

train_ds = MRIData()
train_dl = DataLoader(train_ds, batch_size=1..) # add whatever args you need

You can try this by changing path to data

Thanks for your answer.
I couldn’t understand that how can I load them in batches? (in other words, what parameter defines the batch size?).
Moreover, I’ve converted my data and its labels into a single tensor file using TensorDataset and then save it in my google drive (using torch.save) . Is it possible to load it in bathces??