Some problems with WeightedRandomSampler

Alessio_Rosatelli · February 28, 2019, 2:55pm

Yeah I see that now while I was trying to print the samples.

Anyway I manage to get my code work but I still have doubt about the sampler. My weights for each class are these:


[0.00961538 0.00155763 0.00127551]

and that’s correct since my class_0 have only few occurrences. When I don’t specify any Sampler I get a class distribution in every batch that looks like this (with batch_size 170):


(array([0, 1, 2]), array([ 6, 75, 89], dtype=int64))

(array([0, 1, 2]), array([11, 65, 94], dtype=int64))

(array([0, 1, 2]), array([13, 80, 77], dtype=int64))

(array([0, 1, 2]), array([15, 73, 82], dtype=int64))

(array([0, 1, 2]), array([10, 66, 94], dtype=int64))

and this looks good, in fact it represents the distribution of the class in my dataset as I was expecting.

But with the weighted sampler I get:


array([0, 1, 2]), array([ 1, 66, 103], dtype=int64))

(array([0, 1, 2]), array([ 1, 75, 94], dtype=int64))

(array([0, 1, 2]), array([ 4, 72, 94], dtype=int64))

(array([0, 1, 2]), array([ 1, 75, 94], dtype=int64))

(array([0, 1, 2]), array([ 4, 61, 105], dtype=int64))

(array([0, 1, 2]), array([ 3, 61, 106], dtype=int64))

What I was expecting is to get more samples of my lower presence class but I get less samples of it instead. Furthermore sometimes there are no samples of the class_0 in my batch and this totally mess up with my metrics evaluation. Do you think this is working properly or is there still some bugs?