Hw to implement spatial dependent convolution?

I’m trying to implement convolution using matrix multiplication or something good approach.

I have spatial dependent kernel,
K dim=(H,W,S*S) eg., S=5 (5x5 convolution)
T dim=(H,W,C)

after convolution, as a result, I want to get,
R dim=(H,W,C)

currently, I use matrix multiplication in each point like this as numpy for test,
for y in range(h):
for x in range(w):
patch = get_patch(x,y) # return S*S (x,y) centered patch
R[y,x] = np.matmul(T[y,x], K[y,x])

but this approach uses CPU
I want to execute this on GPU using pytorch
Is there any way to implement this?