@ptrblck, I am trying to use WeightedRandomSampler for handling imbalance in the dataset. However, the intuition behind it is not clear to me. My target labels are in form of onehot encoded vectors as below.
train_labels.head(5)

none 
infection 
ischaemia 
both 
0 
1 
0 
0 
0 
1 
1 
0 
0 
0 
2 
0 
1 
0 
0 
3 
0 
1 
0 
0 
4 
0 
1 
0 
0 
Below are the steps, I used to calculate for the weighted random sampler. Please correct me if I am wrong with the interpretation of any steps.
 Count the number of samples per class in the dataset
class_sample_count = np.array(train_labels.value_counts())
class_sample_count
array([2555, 2552, 621, 227])
 Calculate the weight associated with each class
weight = 1. / class_sample_count
weight
array([0.00039139, 0.00039185, 0.00161031, 0.00440529])
 Calculate the weight for each of the samples in the dataset.
samples_weight = np.array(weight[train_labels])
print(samples_weight[1], samples_weight[2] )
[0.00039185 0.00039139 0.00039139 0.00039139] #label 0 in actual data
[0.00039139 0.00039185 0.00039139 0.00039139] #label 1 in actual data
The dimension of samples_weight
comes to be [5955, 4]. 5955 are the total no. of images in the original set, and 4 corresponds to the total number of classes.
Now how this mapping has been done? Since class weight
for class 0 is 0.00039139 (obtained in step 2). How were the rest of the three entries picked up for class 0?
 Convert the np.array to tensor
samples_weight = torch.from_numpy(samples_weight)
samples_weight
tensor([[0.0004, 0.0004, 0.0004, 0.0004],
[0.0004, 0.0004, 0.0004, 0.0004],
[0.0004, 0.0004, 0.0004, 0.0004],
...,
[0.0004, 0.0004, 0.0004, 0.0004],
[0.0004, 0.0004, 0.0004, 0.0004],
[0.0004, 0.0004, 0.0004, 0.0004]], dtype=torch.float64)
After conversion to tensor, all the samples appear to have the same value in all four enteries? Then how does Weighted Random Sampling
is oversampling the minority class?
I will be grateful for any leads. Thank you.