How can we be sure that weights are mapped to the proper class? During training, the dataset will be in batch (and shuffled). Should I map weights according to how the model is seeing each label in each batch?
I’m asking that because I would be much more comfortable passing a dictionary of weights to the loss function, instead of a list
In my understanding, classes are assigned with integers (from 0 to num_classes).
Following this, weights will be a list of num_classes, with weights[i] corresponding to weight of ith class.
Can you elaborate a bit on how you calculate the weights?
Also, both the weights lists are different here.
Basically, the weights list has length = number of total classes. (Not the number of samples) weights[i] = weight calculated for class i.
I am not sure how your example relates to this definition. Maybe I am a bit confused with your example.
You are totally right, let me explain better now (and correct my mistakes)
For example, image I have a dataframe df like this
| var | label
| a | 2
| b | 2
| c | 1
| d | 3
In the example before, I calculated the class weights as
w = 1 / (df.label.value_counts()/df.label.value_counts().sum())
w = w/w.sum()
which leads to this class weight (you are right, number total class not number of samples)
2 0.2
3 0.4
1 0.4
Now, when I pass w to my loss function through torch.from_numpy(w.values), should I first order it so that it reflects the order of the classes (1, 2, 3), therefore [0.4, 0.2, 0.4]? Or should I pass it as-is, therefore [0.2, 0.4, 0.4]?
Thanks for explaining!
Assuming that you are mapping the classes [1,2,3] into pytorch equivalent 0-based classes of [0,1,2], you have to pass [0.4, 0.2, 0.4] as you mentioned.