You could return a constant tensor, which you could then filter out in the training loop, but note that this approach would lower the batch size and in the edge case you could also end up with a completely empty batch.
The better approach would be to remove these indices from the beginning.
If you could compute the invalid indices before (or in the __init__ method), you could use a valid_idx list, which would then return only the samples, which should be returned:
def __init__(self):
self.valid_idx = [0, 2, 3, 5, 6, 8, ...] # calculate only valid indices or filter out invalid ones
def __getitem__(self, index):
idx = self.valid_idx[index]
data = self.data[idx]
return data
def __len__(self):
return len(self.valid_idx)
Actually, I made MongoDB wrapper for collecting several collections
And, the wrapper is connected to a custom Dataset class
Thus, maybe I can build some valid index for the collections in the mongodb wrapper, which is pre-built index before training loops checking whether the document is valid or not