Decoder in U-Net with Pretrained ResNet34 Encoder

Anastasia_Berlianna · May 11, 2023, 2:19pm

i’m still trying to understand my decoder part. my input image size for image is 512x512 RGB, when i check the encoder output for each resnet block it gives
layer1 = (64, 256, 256)
layer2 = (128, 128, 128)
layer3 = (256, 64, 64)
layer4 = (512, 32, 32)
i use bottleneck and the output is (1024,32,32)

my question is, when i use decoder part like the basic unet, it can’t return the image resolution to 512x512 in the end but instead 256x256, so am i right if i use convtranspose2d as upsampling and then use F.interpolate in the forward part?
here’s my decoder code

class Dec_Block(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()

        # Up-Convolution
        self.upconv = nn.ConvTranspose2d(in_channels, in_channels // 2, kernel_size=2, stride=2)
        
        # LeakyReLU
        self.relu = nn.LeakyReLU()
        
        # Batch normalization
        self.bn = nn.BatchNorm2d(out_channels)

        # Basic Block
        self.conv = BasicBlock(in_channels // 2 + out_channels, out_channels)

    def forward(self, inputs, skip):
        up_x = self.upconv(inputs)
        up_x = self.relu(up_x)
        up_x = self.bn(up_x)  
        skip = nn.functional.interpolate(skip, size=up_x.size()[2:], mode='bilinear', align_corners=True)
        x = torch.cat([up_x, skip], dim=1)
        x = self.conv(x)

        return x