How can overfitting be detected using weighted binary cross entropy?

Hi Maryam!

Yes, provided that what you call your “error” is the loss criterion you use
for training (in your case weighted binary cross entropy), or, if your error
is not your actual training loss, then at least – to use @ptrblck’s language –
your loss is a good “proxy for the metric you care about.”

The short story is that provided your training and validation datasets are
similar in character and you use the same loss criterion for both, including
class weights, if you have them, then divergence between your validation
and training losses indicates overfitting.

In a narrow technical sense you can still talk about overfitting even if your
training loss isn’t a good match for the “metric you care about” (although in
such a case you won’t be training your model to do what you really want
it to do) – it’s still logically consistent to say that you are overfitting your
training set with respect to the loss criterion you are training with.

What matters is that your training and validation data sets have the same
character and, in the case of unbalanced data, have the same relative
class weights. The cleanest way to do this – if you can, which you generally
should be able to – is to have your training and validation data sets come
from the same larger data set.

Let’s say you have 10,000 individual data samples. If you randomly split
this data set into a training set of, say, 8,000 samples and a validation set
of 2,000 samples, the two datasets will be statistically identical. In particular,
they will have (statistically*) the same relative class weights,

Train your training set with a loss criterion of weighted binary cross entropy
and also track the same weighted binary cross entropy on your validation
set. If your validation-set loss starts going up, even as your training-set loss
keeps going down, overfitting has set in, and further training is actually
making your model worse, rather than better. (Note, if the “metric you care
about” doesn’t track your training loss, then your model may or may not
perform well on the “metric you care about,” but that’s a separate issue
from overfitting.)

*) Let’s say that your whole dataset of 10,000 contains 1% positive
samples – that is 100 positive samples – and you randomly split it 80/20.
Then your validation set of 2,000 samples will contain about 20 positive
samples. But this number could easily range from something like 15 to
25 positive samples, which could be enough of a difference to somewhat
affect your validation-loss computation. If you want to be fancier, you
could separately randomly split your positive and negative samples 80/20
so that your validation set contains exactly 20 positive samples (and your
training set, 80). The smaller your dataset and the greater your class
imbalance, the more this nuance will matter.)

Best.

K. Frank