Writing this function in pytorch

hello all I’d like to write the function below in pytorch in order to fully utilize my computer’s gpu such that I can speed up the calculation. Currently this step is really slowing my algorithm down, any help is greatly appreciated. Some notes, the function essentially creates a gaussian kernel with a predefined window size and sigma which is created with respect to the euclidean distance between the center of the input feature map and the current point of interest.

def pseudo_colliculus_map (neural_map):
    dimensions = neural_map.size()
    mapped = torch.zeros(dimensions[2], dimensions[3]).cuda()
    localizer = torch.Tensor([(dimensions[2]/2),(dimensions[3]/2)]).type(torch.int64).cuda()
    for i in range(0, dimensions[2]):
       for j in range(0, dimensions[3]):

            euc = math.sqrt((localizer[0]-i)**2 + (localizer[1]-j)**2)
            sigma = 0.06*(euc) + 0.4
            size = torch.Tensor(np.array(sigma*3)).type(torch.int64).cuda()
            kernel = np.fromfunction(lambda x, y: (1/(2*math.pi*sigma**2)) * math.e ** ((-1*((x-(size-                         1)/2)**2+(y-(size-1)/2)**2))/(2*sigma**2)), (size, size))
           kernel = torch.div( kernel, torch.sum(kernel))
            weights = kernel.type(torch.FloatTensor).cuda()
            weights = weights.view(1,1,size, size)
            pad = math.ceil((size)/2)
            mapping = F.conv2d(neural_map, weights, stride =1, padding = pad)
            val = mapping[0, 0, i,j]
            mapped[i,j] = val

    return mapped

any help would be greatly appreciated

any suggestions on how I could implement this in a more efficient manner?

Hi, have you solved it ? I met similar problem.

Hi,I guess it is probably your large number of .cuda() operations during the for loop that makes it quite slow. How about trying to initialize variable “size” out of the loop and assign it in the loop?