Hi Aquafina!
Let me illustrate the scheme with a simplified example.
First let me note that it is perfectly reasonable to weight multi-lesion images with
varying weights. It’s just something of a pain in the neck to choose such weights
in a way that is useful. In this example, all multi-lesion images will be given the
same weight and we will rely on your statement that you have enough single-lesion
images of each lesion class to be able to compensate for the class imbalance by
reweighting just the single-lesion images.
Let’s say you have just three lesion classes, lesion-A, lesion-B, and lesion-C. Let’s
say that your training set consists of 30 lesion-A, 10 lesion-B, 5 lesion-C images. as
well as 5 multi-lesion images. Furthermore let’s say – just for simplicity – that in
aggregate the multi-lesion images contain 30 lesion-A’s, 10 lesion-B’s, and 5 lesion-C’s.
(So across your entire training set – both single- and multi-lesion images – you have
a total of 60 lesion-A’s, 20 lesion-B’s, and 10 lesion-C’s.)
We can compensate for the class imbalance by using the following image weights:
multi-lesion images all have weight 1, lesion-A images have weight 1, lesion-B images
have weight 5, and lesion-C images have weight 11.
Now the multi-lesion images contribute (30, 10, 5) to the weighted class counts. (These
are just the counts of the lesion classes in the multi-lesion images with weight 1.) The
30 lesion-A images – with weight 1 – contribute 30 to lesion-A weighted class count,
for a total lesion-A weighted class count of 60. With weight 5, the 10 lesion-B images
contribute 50 to the lesion-B weighted class count, for a total of 60. Lastly, with weight
11, the 5 lesion-C images contribute 55 to the lesion-C weighted class count, again for
a total of 60.
The basic idea is you just take however many of each lesion class you get from the
multi-lesion images (so that we don’t have to worry about how to weight them) and
use the single-lesion images – which you say you have – to “top up” the weighted
class counts for any underrepresented lesion classes by choosing the appropriate
weight for single-lesion images from any particular underrepresented class.
(Note, in this example, the chosen weights give us exactly-equal weighted class counts.
This is not necessary – they just have to be about the same. Without weights, you have
six times as many lesion-A’s as lesion-C’s – probably too much of an imbalance. You
have twice as many lesion-B’s as lesion-C’s – while that’s probably okay, it’s not ideal.
If you get your weighted class counts to be the same to 10% or 20%, that should be
fine – they don’t need to be exactly equal.)
Best.
K. Frank