Extend PyTorch with new pooling module

I want to write a pooling module that performs mode pooling. For an input, this kernel returns the most frequent value in the window. While it’s fairly simple to do this for a single input, it seems more complex to do this for a convolution-style kernel. That is, one that strides over the entire image. i looked at the implementations of max pooling, but those modules call deeper functions in the API that I can’t seem to access/figure out.

For the record, the gradient in this case would be the analogous to max pooling. 1 if the element is the mode, 0 otherwise.

How can I implement this as a “convolution-style” operation?