Segmentation and number of classes

Hi, I am reading the torchvision reference for segmentation model. What I am trying to understand is, since the pretraining classes is 21, Should the target mask be integers from 0 to 21, where 0 is the background?

If so, my next question is do we need to compute loss over 0 (background)? I think generally background pixels is the majority, will it hurt the model so that the model would always predict 0 (background)? Or should we ignore background label by passing ignore_index=0 in cross entropy loss? I see that here we are ignoring index 255, but I am not sure where 255 comes from.

Thanks! Any input is highly appreciated!