For a segmentation use case focal loss might help or a pixel-wise weighting of the minority classes.
The large negative values are approx. zero as probabilities and I would assume an Inf somewhere in the computation might have some bad side effects.
Thanks, Iām glad my posts are helpful.