I want to delete a sample if does not hold a certain condition. I cannot check for this condition beforehand. I wrote the sample code on how I wanted it to work.
Will this work?
If not how should i do it.
class Mydataset(Dataset):
def __init__(self, data_list):
self.data = data_list
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
if some_condition:
sample = self.data[idx]
else:
del self.data[idx]
self.__getitem__(idx)
return sample
I cannot really do that because the behaviour of sample might sometime pass or fail the condition.
How about this code. I will make the shuffle to be false and manually shuffle the dataset beforehand. I think this would make the batch pick samples in a sequence.
class Mydataset(Dataset):
def __init__(self, data_list):
self.data = data_list
self.len = len(self.data)
def __len__(self):
return self.len
def __getitem__(self, idx):
if idx> self.len:
idx = np.random.randint(0,self.len)
if some_condition:
sample = self.data[idx]
else:
del self.data[idx]
self.len = len(self.data)
self.__getitem__(idx)
return sample
This would result is some duplicate samples passed to the network. I am okay with that. Do you think this will work?