Delete a dataset sample in the runtime

I want to delete a sample if does not hold a certain condition. I cannot check for this condition beforehand. I wrote the sample code on how I wanted it to work.
Will this work?
If not how should i do it.

class Mydataset(Dataset):

    def __init__(self, data_list):

        self.data = data_list

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
         if some_condition:
               sample = self.data[idx]
        else:
               del self.data[idx]
               self.__getitem__(idx)

        return sample

I think this won’t work, as you are not changing the length of the dataset and this might use an invalid index.

Would it be possible to iterate the dataset once, store all invalid indices, and remove them before creating your new “clean” dataset?

I cannot really do that because the behaviour of sample might sometime pass or fail the condition.

How about this code. I will make the shuffle to be false and manually shuffle the dataset beforehand. I think this would make the batch pick samples in a sequence.

class Mydataset(Dataset):

    def __init__(self, data_list):

        self.data = data_list
        self.len = len(self.data)

    def __len__(self):
        return self.len

    def __getitem__(self, idx):
         if idx> self.len:
               idx = np.random.randint(0,self.len)
         if some_condition:
               sample = self.data[idx]
         else:
               del self.data[idx]
               self.len = len(self.data)
               self.__getitem__(idx)

        return sample

This would result is some duplicate samples passed to the network. I am okay with that. Do you think this will work?

I’m not sure about deleting the sample from the dataset but did you try checking for the condition as you fetch the batches?

I think it is possible to remove/modify the batches fetched from dataloader in general.