WeightedRandomSampler does not oversample when calling dataloader.new()

anlytix · April 10, 2020, 8:09am

I have a dataloader that wraps 100 samples of data.
I have a WeightedRandomSampler that returns 1000 samples.
However, when I create a new dataloader with this sampler, the new dataloader also has 100 samples, and not 1000 as I expect - why is this?

from random import randint
from torch.utils.data import WeightedRandomSampler

# dl1 is a dataloader with 100 samples
weights = [randint(1, 10) for _ in range(100)]
nsamples = 1000
sampler = WeightedRandomSampler(weights, nsamples)
dl2 = dl1.new(shuffle=False, sampler=sampler)
print((len(dl1.y), len(dl2.y)))
# outputs 100, 100 (I expect this to be (100, 1000))

ptrblck · April 10, 2020, 10:12am

torch.utils.data.DataLoader doesn’t implement the new() method (or I’m not aware of it).
Which DataLoader implementation are you using?

anlytix · April 10, 2020, 2:24pm

@ptrblck I am using the fastai implementation.

anlytix · April 10, 2020, 4:06pm

You are right, @ptrblck - the class I am working with is the fastai DeviceDataLoader - that binds a DataLoader to a torch.device.

ptrblck · April 10, 2020, 8:17pm

Thanks for the information.
I’m unfortunately not familiar with this DeviceDataLoader, but do you know, why you would like to bind a DataLoader to a specific device?

anlytix · April 10, 2020, 8:56pm

From the documentation:

Bind a DataLoader to a torch.device .

Put the batches of dl on device after applying an optional list of tfms . collate_fn will replace the one of dl . All dataloaders of a DataBunch are of this type.

I noticed that the new method is just overwriting the kwargs of the DeviceDataLoader object with the ones passed in with new and constructing a new DeviceDataLoader object with it.

def new(self, **kwargs):
        "Create a new copy of `self` with `kwargs` replacing current values."
        new_kwargs = {**self.dl.init_kwargs, **kwargs}
        return DeviceDataLoader(self.dl.__class__(self.dl.dataset, **new_kwargs), self.device, self.tfms,
                                self.collate_fn)

I don’t see why the returned object is not oversampling as expected.

ptrblck · April 12, 2020, 2:53am

I’m unfortunately not familiar enough with this wrapper, but @jphoward should be able to help out.