The codes are like this:
cls_weights = np.array([ 0.027, 0.027, 0.022, 0.093, 0.310, 0.310, 0.058, 0.012, 0.077, 0.006, 0.024, 0.028, 0.001]) vds = datasets.ImageFolder(data_path, transforms.Compose([ transforms.ToTensor(), transforms.Normalize([0.456], [0.224])]), loader = Image.open) sampler = torch.utils.data.sampler.WeightedRandomSampler(cls_weights, len(vds)) dataloader = DataLoader(vds, batch_size=batch_size, sampler=sampler, num_workers=4) for data in dataloader: inputs, labels = data ...
But all the sampled labels are 0. If remove the sampler, the code would work fine.
I tried to sample from sampler with
next(sampler.__iter__()), and the output seems fine:[8, 6, 3, 3, 3, 0, 6, 8, 1, 7]
Tried to read the DataLoader and DataLoaderIter source codes, found they are a bit hard for me.
Can anyone tell me where’s the problem? Thanks~
The reason should be shuffle is set to False when sampler is used, and sequential sampler is used.
But training with random order is crucial to my dataset. In my opinion, sampling should work like this: first sample a batch of labels from the multinomial distribution, then sample data from dataset that have the corresponding labels. Why would pytorch choose sequential sampling? Training on different classes separately would definitely cause problems.