How to implement MLPconv of Network in Network?

I did not see any pytorch implementation about this, so I want to give a try.

from the paper, it is said that, they apply a tiny MLP layer(or few layers) after the conv kernel. So this MLP layer is acting locally instead of globally, I saw some code just using conv2d with kernel size 1 attached to conv2d with kernel size 3, which is different from the original paper.

Because the convolution in CNN, before going to ReLU, is just a elementwise-multiplication with a sum action, so the MLPconv should be something like below:

# suppose we have a conv kernel K, which is 3x3 in size, and
# it is right about to do the element-wise multiplication with the image patch P, 
# which is also 3x3 in size(think about a 3x3 area of an image with NxM size). 
# Here I use * represents element-wise multiplication.

mlp1 = nn.Linear(9,9)
mlp2 = nn.Linear(9,1)
relu = F.relu()

# mlpconv action then is equal to a convolution action with 
# each of the step doing below calculation:
relu( mlp2( relu( mlp1( (P*K).view(-1) )))) 

but how to do this in pytorch???

I need some ideas and your help.

How would the input channels be treated?
Would the linear layer perform an operation such as:

# input patch
patch = torch.randn(in_channels, height, width).view(-1)
lin = nn.Linear(in_channels * height * width, 1)
out = lin(patch)

If so, then this would be equal to a conv layer without overlapping windows, no?

ok i implemented, with a sliding window…now they are updating…both mlp and conv2d

but it is really ugly…I still look forward your opnions