Obtain intimate segmentation result?

In Image Segmentation, we use a encoder to shrink the size of feature and a decoder to enlarge the encoder output to initial image size, then do a conv op to transform the feature map to a C channels result for apply a softmax later. Note that the channels C stands for the category counts in dataset.
that’s an usual case.But what if I want to distinguish the pixels inside the encoder feature maps? In raw image, we can distinguish pixels by the ground-truth,Is there a proper mask for intimate feature map? how to get the proper mask?
I don’t want to enlarge the last output in encoder by using deconv op one by one,Can I conv intimate feature map
to raw size directly to get the mask, then train it by loss and bp.So every decoder layer can generate a sepcific mask for this layer itself?

I don’t fully understand your question.
Do you want to use the encoder output as the segmentation prediction and just train the encoder without the decoder part?
If so, what do you mean with your last sentence?

Is there a decoder for each encoder layer?

Note that when you use the “small” encoder output to train your model, you would need to resize your segmentation mask to the appropriate size which will either yield very small segmentation prediction for your model or probably “blurred” segmentations if you resize it to the original size.

I see it.I think I understand somethin wrong,Thx!