Implemented cv2 filter2D but output is wrong

Read about kernel correlation and implemented cv2 filter2D but for an example test input and kernel, the output produced does not match output from cv2.filter2D(). What am I missing here?

# input two tensors, images is 3d, kernel 2d
def my_filter2D(image, kernel):
    kx, ky = kernel.shape
    x_n = kx//2
    y_n = kx//2
    padded_image = torch.nn.functional.pad(image, (0, 0, ky//2, ky//2, kx//2, kx//2))
    filtimg = torch.Tensor(image)
    px, py, nc = padded_image.shape
    for x in range(px):
        for y in range(py):
            for c in range(nc):
                if x - x_n < 0:
                    continue
                if x + x_n >= px:
                    continue
                if y - y_n < 0:
                    continue
                if y + y_n >= py:
                    continue
                total = 0.0
                image_n = padded_image[x-x_n:x+x_n+1, y-y_n:y+y_n+1,c]
                for x_i in range(kx):
                    for y_i in range(ky):
                        total += image_n[x_i,y_i]*kernel[x_i,y_i]
                filtx = x-x_n
                filty = y-y_n
                filtimg[filtx,filty,c] = total
    return filtimg

Here’s sample input and kernel from a stackoverflow response:

import numpy as np
import cv2


a = np.array([[0.0,0.0,0.0,0.0,0.0],
              [0.0,0.0,0.0,0.0,0.0],
              [10.0,0.0,0.0,0.0,0.0],
              [20.0,20.0,20.0,0.0,0.0],
              [30.0,30.0,30.0,30.0,30.0]], dtype=np.float32)

kernel = np.ones((3,3), dtype=np.float32)

filtered_a = cv2.filter2D(a, -1, kernel)

The output of the above cv2.filter2D() call produces:

array([[  0.,   0.,   0.,   0.,   0.],
       [ 10.,  10.,   0.,   0.,   0.],
       [ 70.,  70.,  40.,  20.,   0.],
       [160., 160., 130., 110.,  90.],
       [210., 210., 170., 130.,  90.]], dtype=float32)

But the output from my filter2D implementation does not match the above from cv2.filter2D() - how to fix the code above?

   a = np.array([[0.0,0.0,0.0,0.0,0.0],
              [0.0,0.0,0.0,0.0,0.0],
              [10.0,0.0,0.0,0.0,0.0],
              [20.0,20.0,20.0,0.0,0.0],
              [30.0,30.0,30.0,30.0,30.0]], dtype=np.float32)
    testkernel = np.ones((3,3), dtype=np.float32)
    a3d = a.reshape(5,5,1)
    my_filter2d_a = my_filter2D(torch.from_numpy(a3d), torch.from_numpy(testkernel))

and output:
tensor([[  0.,   0.,   0.,   0.,   0.],
        [ 10.,  10.,   0.,   0.,   0.],
        [ 50.,  70.,  40.,  20.,   0.],
        [110., 160., 130., 110.,  60.],
        [100., 150., 130., 110.,  60.]])

Please redirect me the correct algorithm for to implement cv2 filter2D().