What's the Best Way to Implement a Non-convolutional Network?

I would like to implement an inner product between an image and a set of kernels. The kernels are localized, say 10x10 (out of a 256x256 image) but each kernel is only used at a certain point in contrast to the conventional convolution approach. Does anyone know what’s the most efficient way to implement this computation in pytorch? Thanks a lot in advance!