I would like to remove specific indices from dataset. I tried this but it doesn’t work
self.cifar10 = datasets.CIFAR10(root='./data',
self.data = self.cifar10.data
self.targets = self.cifar10.targets
self.final_data, self.final_targets = self.__remove__(remove_list)
def __getitem__(self, index):
data, target = self.final_data[index], self.final_targets[index]
return data, target, index
def __remove__(self, remove_list):
data = np.delete(self.data, remove_list)
targets = np.delete(self.targets, remove_list)
return data, targets
I realized it was an issue with the way I deleted items, should be:
data = np.delete(self.data, remove_list, axis=0)
targets = np.delete(self.targets, remove_list, axis=0)
But is it doing the correct thing overall?: Removing specific images based on the index or is the index different every time it is loaded?
The data should be loaded in the same order, but of course you could verify it by comparing some random data samples.
np.delete should work fine on numpy arrays. Alternatively, you could also slice the arrays by creating a
mask array and setting the values at
mask = np.ones(len(arr), dtype=bool)
mask[remove_list] = False
data = self.data[mask]
This is so helpful, thank you so much!
can you explain me in more details what do the last two lines of code do?
The code snippet initializes a
True values for all entries first.
The second line of code then uses the
remove_list indices to index
mask and sets these values to
False. In the last line of code
self.data is indexed with
mask and reassigned to
data which will then contain all entries from
mask was set to
Thank you, very clear! So (correct me if I’m wrong) you also need a further line to do the same on the targets, right? Something like:
targets = self.targets[mask]
Yes, if you are working with a target tensor and want to remove the same indices you would have to add your line of code.