Dataloader for Weakly-Supervised Learning

NoobCoder · March 12, 2021, 3:54am

Hello,

I need to have a data loader to be compatible with my model.

I want to run a model, say ResNet, on a dataset. In each epoch, I select for example 100 images randomly, find the top 20 images of those 100 selected images (lowest loss), and add these 20 to the next epoch and select 80 more.

In general, in each epoch, I need 100 images where 20 of them are top among the previous epoch and 80 of them are randomly chosen. Is there any way that I can implement this in the dataloder?
What I am thinking is to writing the bacth_sampler that handle these stuffs, but I am not sure how to tell him the top 20 images?

Thanks,

ptrblck · March 12, 2021, 8:14am

You could create a custom Subset or SubsetRandomSampler for each training “iteration”.
I.e. in the first iteration you would select 100 random indices and store the desired 20 indices after the training is done. Afterwards you would recreate the Subset or sampler as well as the DataLoader and repeat this workflow.

rahulvigneswaran · March 12, 2021, 10:58am

Hi @NoobCoder,

@ptrblck is suggesting something like this,

class YourCustomSubset(torch.utils.data.Subset):
    r"""
    Same as torch.utils.data.Subset. This outputs the indices of each batch, you can choose which indices to keep.
    """

    def __getitem__(self, idx):
        return self.dataset[self.indices[idx]], idx

Check this gist/Colab notebook for the full implementation with dummy example.