How two decoders share same encoder features? Do I need to use detach()?

I have a network that includes one encoder and two decoders. The two decoders share the same encoder. The paper said

SO and CO branches - We take the downsampling branch of a U-Net as it is, however we split the upsampling branch into two halves, one to obtain the Region of Interest and the other for Complementary aka non region of interest. Losses here are negative dice for ROI and positive dice for Non-ROI region.

Assume that the encoder provides the feature likes

encoder_feature =[ f1, f2, f3, f4, f5]

So, both decoder_SO and decoder_CO use encoder features to update the weights. Which one is correct?

decoder_SO = [ up_sample(f1), up_sample(f2), up_sample(f3), up_sample(f4), up_sample(f5) ]
decoder_CO = [ up_sample(f1), up_sample(f2), up_sample(f3), up_sample(f4), up_sample(f5) ]

Or using detach()

decoder_SO = [ up_sample(f1), up_sample(f2), up_sample(f3), up_sample(f4), up_sample(f5) ]
decoder_CO = [ up_sample(f1.detach()), up_sample(f2.detach()), up_sample(f3.detach()), up_sample(f4.detach()), up_sample(f5.detach()) ]

If you want to train the encoder together with both decoders the first variant is the one to go with.

Version 2 trains the encoder together with decoder_SO and the other decoder will be trained separately.

1 Like

If I want to train encoder using both information from decoder_SO and decoder_CO. So I need to use option 1, Is it right? I am not clear about your answer in second case

Yes, you have to use Version 1.

The second version would detach the decoders input from the encoders computational graph. This means, that no gradients could be propagated back through the encoder and thus the encoder would only be trained with one decoder.

Hi, I’m trying to implement a similar architecture. I’m confused about what the loss function would look like? Any help/suggestion/advice would be greatly appreciated