Upsampling Semantic Segmentation


(Mohit Sharma) #1

I apologize in advance if this is very trivial but I don’t have a lot of experience in segmentation networks and pytorch.

I am participating in ICLR Reproducibility Challenge 2018 and I am trying to reproduce the results in the submission “Adversarial Learning For Semi-Supervised Semantic”.

I am confused about how to upsample the feature map produced by a network to match the size of the input image. This is how the paper explains the segmentation network.

Segmentation network. We adopt the DeepLab-v2 (Chen et al., 2017) framework with ResNet- 101 (He et al., 2016) model pre-trained on the ImageNet dataset (Deng et al., 2009) as our segmenta- tion baseline network. However, we do not employ the multi-scale fusion proposed in Chen et al. (2017) due to the memory concern. Following the practice of recent work on semantic segmenta- tion (Chen et al., 2017; Yu & Koltun, 2016), we remove the last classification layer and modify the stride of the last two convolution layers from 2 to 1, making the resolution of the output feature maps effectively 1/8 times the input image size. To enlarge the receptive fields, we apply the dilated convolution (Yu & Koltun, 2016) in conv4 and conv5 layers with a stride of 2 and 4, respectively. After the last layer, we employ the Atrous Spatial Pyramid Pooling (ASPP) proposed in Chen et al. (2017) as the final classifier. Finally, we apply an up-sampling layer along with the softmax output to match the size of the input image.”

I tried looking at the DeepLab-v2 version of Resnet101 and I couldn’t understand which layer is con4 and conv5. So, instead, I used the network defined here https://github.com/speedinghzl/Pytorch-Deeplab. While training on PascalVOC dataset, the input image is first randomly scaled and cropped to 321x321. For this size, the network produces an output map of size 41x41. I tried using torch.nn.Upsample(size=321x321) to map this 41x41 feature map to size 321x321 but it gives me an error that output size should be a multiple of input size ( which makes sense)

My question is, how can I upsample my segmentation network output feature map to the size of my input image?


(Federico Pala) #2

Hey, you want to go from 41x41 to 321x321? If you upsample 8 times with conv transpose 2d you get 328. Then a conv filter of size 8x8, you should get 321x321


(Mohit Sharma) #3

Hey! Thanks for the suggestion. This would work for my training scheme where I know that my input is 321x321.

But in the inference phase, the input image size is not fixed and I want a cleaner way where I can upsample by simply specifying the target size.

For example, this is the behavior I want

feature_map = net(input) // The size of the feature map depends on the size of input
output = torch.nn.Upsample(size = input_size) // input_size is the size of input

Please write back if I didn’t make something clear.