Multi-class BERT Model: Class weights and where to use them

  1. Yes, the weighted loss can be reused to get a signal about overfitting as @KFrank explains in this post.
  2. Both datasets should come from the same domain and have the same or similar distributions as also explained in the linked post.
  3. For the final run on the test data you should calculate the metric which represents your target the closest. E.g. calculating the accuracy for an imbalanced use case could show a high value, while your model might be useless and predict only the majority class due to the accuracy paradox.
1 Like