Help for Sampling training data for multi-label classification task?!

Hi there!
I have got a dataset which each sample has multiple labels. My goal is to create a multi-label classifier on my dataset. There are 2 possible labels, label A and B. But each sample can have 19 * 19 * 5 labels based on some conditions.
In each sample, labels do not have equal numbers. I mean, in each sample there are some hints of labels imbalancy and the ratio of class label A with respect to B in whole dataset is 1000:1. It seems that the number of label B is not enough.
The accuracy result is 99.97 % but the recall measure is so small around 18 % which this is actually the effects of imbalance. One way to solve this phenomena is to use weighted loss function but it is not a satisfactory solution. I have decided to use oversampling which means select samples with high label B frequency. Could you please tell me how can I implement this type of sampling using sampler in pytorch, I mean sampling based on the frequency of label B in each sample?


You can build your own sampler, where you have a permutation of the data from the class with higher frequency while over sampling from the class with lower frequency:
for example:
class_vector = [0,0,0,0,0,0,0,1,1,1]
class count = {0: 6 1: 3}
build a permutation of 0-6 with
np.random.permutation(6) and a over-sampling vector of class 1:
np.random.randint(7,9,6) # generates 6 integer numbers from 7,8,9

If you do not want data to repeat you can do a Stratified sampling:

helpful links: