I apologize in advance if this is very trivial but I don’t have a lot of experience in segmentation networks and pytorch.
I am participating in ICLR Reproducibility Challenge 2018 and I am trying to reproduce the results in the submission “Adversarial Learning For Semi-Supervised Semantic”.
I am confused about how to upsample the feature map produced by a network to match the size of the input image. This is how the paper explains the segmentation network.
“Segmentation network. We adopt the DeepLab-v2 (Chen et al., 2017) framework with ResNet- 101 (He et al., 2016) model pre-trained on the ImageNet dataset (Deng et al., 2009) as our segmenta- tion baseline network. However, we do not employ the multi-scale fusion proposed in Chen et al. (2017) due to the memory concern. Following the practice of recent work on semantic segmenta- tion (Chen et al., 2017; Yu & Koltun, 2016), we remove the last classification layer and modify the stride of the last two convolution layers from 2 to 1, making the resolution of the output feature maps effectively 1/8 times the input image size. To enlarge the receptive fields, we apply the dilated convolution (Yu & Koltun, 2016) in conv4 and conv5 layers with a stride of 2 and 4, respectively. After the last layer, we employ the Atrous Spatial Pyramid Pooling (ASPP) proposed in Chen et al. (2017) as the final classifier. Finally, we apply an up-sampling layer along with the softmax output to match the size of the input image.”
I tried looking at the DeepLab-v2 version of Resnet101 and I couldn’t understand which layer is con4 and conv5. So, instead, I used the network defined here https://github.com/speedinghzl/Pytorch-Deeplab. While training on PascalVOC dataset, the input image is first randomly scaled and cropped to 321x321. For this size, the network produces an output map of size 41x41. I tried using torch.nn.Upsample(size=321x321) to map this 41x41 feature map to size 321x321 but it gives me an error that output size should be a multiple of input size ( which makes sense)
My question is, how can I upsample my segmentation network output feature map to the size of my input image?