How to implement MLPconv of Network in Network?

BarCodeReader · April 28, 2020, 2:12pm

I did not see any pytorch implementation about this, so I want to give a try.

from the paper, it is said that, they apply a tiny MLP layer(or few layers) after the conv kernel. So this MLP layer is acting locally instead of globally, I saw some code just using conv2d with kernel size 1 attached to conv2d with kernel size 3, which is different from the original paper.

Because the convolution in CNN, before going to ReLU, is just a elementwise-multiplication with a sum action, so the MLPconv should be something like below:

# suppose we have a conv kernel K, which is 3x3 in size, and
# it is right about to do the element-wise multiplication with the image patch P, 
# which is also 3x3 in size(think about a 3x3 area of an image with NxM size). 
# Here I use * represents element-wise multiplication.

mlp1 = nn.Linear(9,9)
mlp2 = nn.Linear(9,1)
relu = F.relu()

# mlpconv action then is equal to a convolution action with 
# each of the step doing below calculation:
relu( mlp2( relu( mlp1( (P*K).view(-1) ))))

but how to do this in pytorch???

I need some ideas and your help.

ptrblck · April 29, 2020, 3:15am

How would the input channels be treated?
Would the linear layer perform an operation such as:

# input patch
patch = torch.randn(in_channels, height, width).view(-1)
lin = nn.Linear(in_channels * height * width, 1)
out = lin(patch)

If so, then this would be equal to a conv layer without overlapping windows, no?

BarCodeReader · April 29, 2020, 2:50pm

ok i implemented, with a sliding window…now they are updating…both mlp and conv2d

but it is really ugly…I still look forward your opnions

SwayStar123 · October 11, 2024, 3:08pm

Here is a simple implementation of it

github.com

SwayStar123/MLPConv/blob/main/mlpconv.py

import torch
import torch.nn as nn
import torch.nn.functional as F
from typing import Union, Tuple

class MLPConv2d(nn.Module):
    """
    A convolutional layer where each convolution operation is performed by an MLP.

    Args:
        in_channels (int): Number of channels in the input image.
        out_channels (int): Number of output channels.
        kernel_size (int or tuple): Size of the convolving kernel.
        stride (int or tuple, optional): Stride of the convolution. Default: 1
        padding (int or tuple, optional): Zero-padding added to both sides of the input. Default: 0
        dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
        num_layers (int, optional): Number of layers in the MLP. Default: 2
        hidden_size (int, optional): Number of neurons in the hidden layers of the MLP. Default: out_channels
        activation (callable, optional): Activation function. Default: nn.ReLU
        bias (bool, optional): If True, adds a learnable bias to the output. Default: True

This file has been truncated. show original

qq-me · October 13, 2024, 4:10pm

MLPConv is the same as pointwise convolution, it is is equivalent to a convolution with 1x1 kernel