Line segmentation of a rugby field with multi class

MarkW · July 20, 2023, 3:52pm

I have a task to segment the different lines of a rugby field:

Each line is labeled with a different class (11 classes + background).

Which model would you use for such task? I thought about U-Net, is there anything more suitable?
In each frame of the tv stream the different classes takes only 2-4% of the pixels compared to the background. Is there a smart way to handle that?
What loss would you chose? Is there anything optimized for such case?

KFrank · July 20, 2023, 5:58pm

Hi Mark!

U-Net is a sound model for semantic segmentation. You should certainly
give it a try. (It would probably be my first choice.)

Use CrossEntropyLoss’s weight constructor argument. Use class
weights that are (approximately) inversely proportional to the frequencies
with which pixels of specific classes appear in your training data.

Use CrossEntropyLoss. It’s the go-to loss function for multi-class
classification (of which multi-class semantic segmentation is a type).
There are other losses, but, if I were to use them at all, I would probably
use them as adjuncts to CrossEntropyLoss rather than instead of
CrossEntropyLoss.

Do you have annotated ground-truth training data? (The more, the
better.) Note that adjacent frames in the tv stream are likely to be
very similar to one another, so they won’t really count as independent
data samples.

Best.

K. Frank

MarkW · July 20, 2023, 6:12pm

@KFrank , Thanks for the advises.
I have ~100 frames. Pretty different.

So all I have is to set the weight of the 11 classes to be high vs. the 1 background class?

KFrank · July 20, 2023, 9:59pm

Hi Mark!

That doesn’t seem like a lot – training might be a bit tricky.

Maybe split into 80 frames for training and 20 frames for validation. Track
your loss and some performance metrics, e.g., accuracy and per-class
intersection-over-union, for both your training and validation sets as you
train. If your model keeps working better on your training set, but starts
working worse on your validation set, you’ve started to overfit, a common
problem when you don’t have a lot of training data.

If you do start to overfit, you could try data augmentation, but it’s not a
magic bullet. (The magic bullet is more data.)

Basically, yes.

Let’s say 10% of your pixels were foreground, with each foreground class
accounting for about 1% of the pixels. Then you would use a weight of
about 1.1 (100 / 90) for your background class and weights of about
100.0 for each of your foreground classes. Your foreground classes are
probably not equally balanced, so, for example, if one of your foreground
classes accounted for only 0.5% of your pixels, you would use for it a weight
of about 200.0.

Note, the weights need not be particularly precise – they should just
roughly compensate for the class imbalance. If your results depend on
the exact values of your weights, that would be a sign of some other
problem such as unstable training or not enough data.

Best.

K. Frank