Thank you for your response.
I think your consideration of the correlation between future time steps is very reasonable, and this is something I hadn’t deeply thought about before. However, I still have some doubts and would like to further explore the applicability of this approach.
Generally, in semantic segmentation, the multi-channel output of U-Net is primarily used for multi-class classification, where each pixel belongs to only one class, and the output channels typically represent the probability of the pixel belonging to each class. For example, in a standard semantic segmentation task with C classes, U-Net’s output is usually in the shape of (batch, C, H, W), where each channel corresponds to the probability of a pixel belonging to a specific class.
However, in my case, each pixel’s label is not a mutually exclusive multi-class classification, but rather 7 independent binary classification tasks. This makes me a bit confused about whether this approach aligns with the intended use of U-Net.
Once again, thank you for your response.