Upsampling Semantic Segmentation

(Mohit Sharma) #1

I apologize in advance if this is very trivial but I don’t have a lot of experience in segmentation networks and pytorch.

I am participating in ICLR Reproducibility Challenge 2018 and I am trying to reproduce the results in the submission “Adversarial Learning For Semi-Supervised Semantic”.

I am confused about how to upsample the feature map produced by a network to match the size of the input image. This is how the paper explains the segmentation network.

Segmentation network. We adopt the DeepLab-v2 (Chen et al., 2017) framework with ResNet- 101 (He et al., 2016) model pre-trained on the ImageNet dataset (Deng et al., 2009) as our segmenta- tion baseline network. However, we do not employ the multi-scale fusion proposed in Chen et al. (2017) due to the memory concern. Following the practice of recent work on semantic segmenta- tion (Chen et al., 2017; Yu & Koltun, 2016), we remove the last classification layer and modify the stride of the last two convolution layers from 2 to 1, making the resolution of the output feature maps effectively 1/8 times the input image size. To enlarge the receptive fields, we apply the dilated convolution (Yu & Koltun, 2016) in conv4 and conv5 layers with a stride of 2 and 4, respectively. After the last layer, we employ the Atrous Spatial Pyramid Pooling (ASPP) proposed in Chen et al. (2017) as the final classifier. Finally, we apply an up-sampling layer along with the softmax output to match the size of the input image.”

I tried looking at the DeepLab-v2 version of Resnet101 and I couldn’t understand which layer is con4 and conv5. So, instead, I used the network defined here While training on PascalVOC dataset, the input image is first randomly scaled and cropped to 321x321. For this size, the network produces an output map of size 41x41. I tried using torch.nn.Upsample(size=321x321) to map this 41x41 feature map to size 321x321 but it gives me an error that output size should be a multiple of input size ( which makes sense)

My question is, how can I upsample my segmentation network output feature map to the size of my input image?

(Federico Pala) #2

Hey, you want to go from 41x41 to 321x321? If you upsample 8 times with conv transpose 2d you get 328. Then a conv filter of size 8x8, you should get 321x321

(Mohit Sharma) #3

Hey! Thanks for the suggestion. This would work for my training scheme where I know that my input is 321x321.

But in the inference phase, the input image size is not fixed and I want a cleaner way where I can upsample by simply specifying the target size.

For example, this is the behavior I want

feature_map = net(input) // The size of the feature map depends on the size of input
output = torch.nn.Upsample(size = input_size) // input_size is the size of input

Please write back if I didn’t make something clear.