I have a dataloader that wraps 100 samples of data.
I have a WeightedRandomSampler that returns 1000 samples.
However, when I create a new dataloader with this sampler, the new dataloader also has 100 samples, and not 1000 as I expect - why is this?
from random import randint
from torch.utils.data import WeightedRandomSampler
# dl1 is a dataloader with 100 samples
weights = [randint(1, 10) for _ in range(100)]
nsamples = 1000
sampler = WeightedRandomSampler(weights, nsamples)
dl2 = dl1.new(shuffle=False, sampler=sampler)
print((len(dl1.y), len(dl2.y)))
# outputs 100, 100 (I expect this to be (100, 1000))
torch.utils.data.DataLoader
doesn’t implement the new()
method (or I’m not aware of it).
Which DataLoader
implementation are you using?
@ptrblck I am using the fastai implementation.
You are right, @ptrblck - the class I am working with is the fastai DeviceDataLoader - that binds a DataLoader to a torch.device.
Thanks for the information.
I’m unfortunately not familiar with this DeviceDataLoader
, but do you know, why you would like to bind a DataLoader
to a specific device?
From the documentation:
Bind a DataLoader
to a torch.device
.
Put the batches of dl
on device
after applying an optional list of tfms
. collate_fn
will replace the one of dl
. All dataloaders of a DataBunch
are of this type.
I noticed that the new
method is just overwriting the kwargs of the DeviceDataLoader object with the ones passed in with new and constructing a new DeviceDataLoader object with it.
def new(self, **kwargs):
"Create a new copy of `self` with `kwargs` replacing current values."
new_kwargs = {**self.dl.init_kwargs, **kwargs}
return DeviceDataLoader(self.dl.__class__(self.dl.dataset, **new_kwargs), self.device, self.tfms,
self.collate_fn)
I don’t see why the returned object is not oversampling as expected.
I’m unfortunately not familiar enough with this wrapper, but @jphoward should be able to help out. 