i’m still trying to understand my decoder part. my input image size for image is 512x512 RGB, when i check the encoder output for each resnet block it gives
layer1 = (64, 256, 256)
layer2 = (128, 128, 128)
layer3 = (256, 64, 64)
layer4 = (512, 32, 32)
i use bottleneck and the output is (1024,32,32)
my question is, when i use decoder part like the basic unet, it can’t return the image resolution to 512x512 in the end but instead 256x256, so am i right if i use convtranspose2d as upsampling and then use F.interpolate in the forward part?
here’s my decoder code
class Dec_Block(nn.Module):
def __init__(self, in_channels, out_channels):
super().__init__()
# Up-Convolution
self.upconv = nn.ConvTranspose2d(in_channels, in_channels // 2, kernel_size=2, stride=2)
# LeakyReLU
self.relu = nn.LeakyReLU()
# Batch normalization
self.bn = nn.BatchNorm2d(out_channels)
# Basic Block
self.conv = BasicBlock(in_channels // 2 + out_channels, out_channels)
def forward(self, inputs, skip):
up_x = self.upconv(inputs)
up_x = self.relu(up_x)
up_x = self.bn(up_x)
skip = nn.functional.interpolate(skip, size=up_x.size()[2:], mode='bilinear', align_corners=True)
x = torch.cat([up_x, skip], dim=1)
x = self.conv(x)
return x