@ptrblck, I am trying to use WeightedRandomSampler for handling imbalance in the dataset. However, the intuition behind it is not clear to me. My target labels are in form of one-hot encoded vectors as below.
train_labels.head(5)
|  | none | infection | ischaemia | both | 
| 0 | 1 | 0 | 0 | 0 | 
| 1 | 1 | 0 | 0 | 0 | 
| 2 | 0 | 1 | 0 | 0 | 
| 3 | 0 | 1 | 0 | 0 | 
| 4 | 0 | 1 | 0 | 0 | 
 Below are the steps, I used to calculate for the weighted random sampler. Please correct me if I am wrong with the interpretation of any steps.
- Count the number of samples per class in the dataset
class_sample_count = np.array(train_labels.value_counts()) 
class_sample_count
array([2555, 2552,  621,  227])
- Calculate the weight associated with each class
weight = 1. / class_sample_count 
weight
array([0.00039139, 0.00039185, 0.00161031, 0.00440529])
- Calculate the weight for each of the samples in the dataset.
samples_weight = np.array(weight[train_labels])
print(samples_weight[1], samples_weight[2] )
[0.00039185 0.00039139 0.00039139 0.00039139] #label 0 in actual data 
[0.00039139 0.00039185 0.00039139 0.00039139] #label 1 in actual data
The dimension of samples_weight comes to be [5955, 4]. 5955 are the total no. of images in the original set, and 4 corresponds to the total number of classes.
Now how this mapping has been done? Since class weight for class 0 is 0.00039139 (obtained in step 2). How were the rest of the three entries picked up for class 0?
- Convert the np.array to tensor
samples_weight = torch.from_numpy(samples_weight)
samples_weight
tensor([[0.0004, 0.0004, 0.0004, 0.0004],
        [0.0004, 0.0004, 0.0004, 0.0004],
        [0.0004, 0.0004, 0.0004, 0.0004],
        ...,
        [0.0004, 0.0004, 0.0004, 0.0004],
        [0.0004, 0.0004, 0.0004, 0.0004],
        [0.0004, 0.0004, 0.0004, 0.0004]], dtype=torch.float64)
After conversion to tensor, all the samples appear to have the same value in all four enteries? Then how does Weighted Random Sampling is oversampling the minority class?
I will be grateful for any leads. Thank you.