Methods about multi-class segmentation

I want to know if it is the same for the following both train ways:

  • 3-class channels for the mask (0 or 1 for each channel):
    • sigmoid output → bceLoss (train)
    • sigmoid output → ge(0.5) → dice (val)
  • one-hot for the mask:
    • softmax output → cross-entropy loss (train)
    • softmax output → argmax → dice (val)

Did they cause a large influence on the model training?
Thanks in advance.