Make custom datasets with some arbitrary indexes

Hi. I wanted to train model with different datasets based on a whole dataset, it is like k fold. My data is made based on custom datasets. Every time I have a list that contains the indexes that I want from my custom datasets. for example mylist is [0,1,2,3] that I need these indexes from whole dataset. My whole dataset has 57 images but I just need 0th,1th,2th and 3th image. This is my custom dataset. I can not understand how can I make a connection between index and mylist. If someone had the same issue please help.

class DatasetBatch(Dataset):  
    def __init__(self, image_dataset, time, event, mylist):
        self.image_dataset = image_dataset
        self.time, self.event = tt.tuplefy(time, event).to_tensor()
        
        
    def __len__(self):
        return len(self.mylist)

    def __getitem__(self, index):
        if not hasattr(index, '__iter__'):
            index = mylist
   
        img = [self.image_dataset[i]['image']  for i in mylist]
        img = torch.stack(img)
        
        return tt.tuplefy(img, (self.time[mylist], self.event[mylist]))

mylist = [0,1,2,3]
dataset1 = DatasetBatch (our_ds, *our_target, mylist)

I’m not sure I understand the use case correctly, but it sounds as if you are looking for the torch.utils.data.Subset class, which would allow you to pass indices to it and only return samples from these indices.

1 Like