I did not see any pytorch implementation about this, so I want to give a try.
from the paper, it is said that, they apply a tiny MLP layer(or few layers) after the conv kernel. So this MLP layer is acting locally instead of globally, I saw some code just using conv2d with kernel size 1 attached to conv2d with kernel size 3, which is different from the original paper.
Because the convolution in CNN, before going to ReLU, is just a elementwise-multiplication with a sum action, so the MLPconv should be something like below:
# suppose we have a conv kernel K, which is 3x3 in size, and
# it is right about to do the element-wise multiplication with the image patch P,
# which is also 3x3 in size(think about a 3x3 area of an image with NxM size).
# Here I use * represents element-wise multiplication.
mlp1 = nn.Linear(9,9)
mlp2 = nn.Linear(9,1)
relu = F.relu()
# mlpconv action then is equal to a convolution action with
# each of the step doing below calculation:
relu( mlp2( relu( mlp1( (P*K).view(-1) ))))