I want to to build an encoder decoder network, with transposed convolutions in decoder.
Is their a standard approach on building a decoder. Suppose I have and input image and pass it through a resnet50 based encoder (removing last avg pool and fully connected), what would be a nice way to build decoder.
I understand that encoder decoder correspondence doesnt signify anything but how would you build the decoder. Would you try build corresponding transposed convolution layer for every convolution in encoder? And would you use stride of transposed convolution to perform inverse of max pooling.
What are the good/standard design choices???
Hi @Naman-ntc! I am working on a side project and have the same issue. How did you end up building that decoder?? Thank you
Naman-ntc Hi, did you build the network? @ franciscocms I have imported a unet decoder from Timm segmentation library. Though I have not finished working with it yet.
Currently I am facing the following problems:
-I want to take the output from resnet 18 before the last average pool layer and send it to the decoder. I will use the decoder output and calculate a L1 loss comparing it with the input image.
-I want to remove only the last linear layer and replace it with linear layer for binary classification as my problem requires a binary classification. Then I want to use this output for calculating BCE loss.
Finally, these two losses will be added in my final model. What is the best way to write the model in code?