How to use ImageFolder with sampler?

Musoy_King · December 23, 2017, 5:06pm

The codes are like this:

cls_weights = np.array([ 0.027, 0.027, 0.022, 0.093, 0.310, 0.310, 0.058, 0.012, 0.077, 0.006, 0.024, 0.028, 0.001])
vds = datasets.ImageFolder(data_path, transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize([0.456], [0.224])]), loader = Image.open)
sampler = torch.utils.data.sampler.WeightedRandomSampler(cls_weights, len(vds))
dataloader = DataLoader(vds, batch_size=batch_size, sampler=sampler, num_workers=4)
for data in dataloader:
    inputs, labels = data
    ...

But all the sampled labels are 0. If remove the sampler, the code would work fine.
I tried to sample from sampler with next(sampler.__iter__()), and the output seems fine:[8, 6, 3, 3, 3, 0, 6, 8, 1, 7]
Tried to read the DataLoader and DataLoaderIter source codes, found they are a bit hard for me.
Can anyone tell me where’s the problem? Thanks~

Update:
The reason should be shuffle is set to False when sampler is used, and sequential sampler is used.

But training with random order is crucial to my dataset. In my opinion, sampling should work like this: first sample a batch of labels from the multinomial distribution, then sample data from dataset that have the corresponding labels. Why would pytorch choose sequential sampling? Training on different classes separately would definitely cause problems.

smth · January 11, 2018, 2:09pm

WeightedRandomSampler already shuffles your dataset.