Loss and Metric Clarification

From the implementation of the dice metric, I have seen repositories using a threshold - 0.5 on their metric with a Sigmoid final activation layer. e.g Eisen Framework Dice Metric In case one is doing multiclass, do I need to use this threshold knowing fully well that that final activation layer is now Softmax?

Also, when training 2D Unet or general deep learning model, does lower training loss compared to validation necessarily need to give higher evaluation metric in training compared to validation? In addition, where do I have my best performing model that generalizes well on the testing dataset, where I have the lowest score (on training/validation) or the highest metric?

Finally, using a single model for two segmentation tasks with criteria which is the best or advisable approach for backward propagation:

final_loss = lossA + lossB
final_loss.backward()

or

lossA.backward(retrain_graph =True)
lossB.backward()

?

Thanks for your contributions in advance