I want to to build an encoder decoder network, with transposed convolutions in decoder.
Is their a standard approach on building a decoder. Suppose I have and input image and pass it through a resnet50 based encoder (removing last avg pool and fully connected), what would be a nice way to build decoder.
I understand that encoder decoder correspondence doesnt signify anything but how would you build the decoder. Would you try build corresponding transposed convolution layer for every convolution in encoder? And would you use stride of transposed convolution to perform inverse of max pooling.
What are the good/standard design choices???