Im trying to use weighted classes within the CrossEntropy loss function and there is no clear cut method to use in pytorch. Currently im using method 1 however this penalizes the loss by a greater amount than the second method. The class weights for method 1 are a lot higher for minority cases which means parameters get a more aggressive update and this potentially distorts the pretrained weights I assume . which is the standard in pytorch
Method 1: total/(samples_per_class × num_classes)
Method 2 : 1/samples per class
class_counts = train_df['level'].value_counts().sort_index().tolist()
total = len(train_df)
weights = [total/(len(class_counts)*c) for c in class_counts]
weights = torch.tensor(weights, dtype=torch.float32)
weights = weights.to(device)
"""
below is the current training distribution for prod, which we use to generate class weights
0 18067
2 3704
1 1710
3 611
4 496
"""
print(weights)
and I get a tensor of tensor([0.2722, 2.8754, 1.3271, 8.0607, 9.9333], device=‘cuda:0’)