Hey, I wonder if it is possible to define a pool layer which output the median value over an specific area instead of max value or mean value only using available pytorch app function? I think this would be helpful but I have no idea how to implement it. Can you help me with some advice? Thank you!!!
If you’re on images (or chw in general) you can use torch.nn.Fold/Unfold to arrange the items you want to pool along one dimension, do your pooling, and then rearrange back.
It’s not terribly fast, but you can always optimize when its working for you.
If it can help, I think a simple implementation of the median pooling using fold/unfold can be
import torch as th import torch.nn as thNn import torch.nn.functional as thFn import math def unpack_param_2d(param): try: p_H, p_W = param, param except: p_H, p_W = param, param return p_H, p_W def median_pool_2d(input, kernel_size, stride, padding, dilation): #Input should be 4D (BCHW) assert(input.dim() == 4) #Get input dimensions b_size, c_size, h_size, w_size = input.size() #Get input parameters k_H, k_W = unpack_param_2d(kernel_size) s_H, s_W = unpack_param_2d( stride) p_H, p_W = unpack_param_2d( padding) d_H, d_W = unpack_param_2d( dilation) #First we unfold all the (kernel_size x kernel_size) patches unf_input = thFn.unfold(input, kernel_size, dilation, padding, stride) #Reshape it so that each patch is a column row_unf_input = unf_input.reshape(b_size, c_size, k_H*k_W, -1) #Apply median operation along the columns for each channel separately med_unf_input, med_unf_indexes = th.median(row_unf_input, dim = 2, keepdim=True) #Restore original shape out_W = math.floor(((w_size + (2 * p_W) - (d_W * (k_W - 1)) - 1) / s_W) + 1) out_H = math.floor(((h_size + (2 * p_H) - (d_H * (k_H - 1)) - 1) / s_H) + 1) return med_unf_input.reshape(b_size, c_size, out_H, out_W)
Thank you very much, it works:+1:
Thank you for your advice