I just read the paper about MobileNet v1.

I’ve already known the mechanism behind that.

But in pytorch, how can I achieve that base on Conv2d?

should I just use 2 Conv2d to achieve that?

You would normally set the `groups`

parameter of the Conv2d layer. From the docs:

The configuration when groups == in_channels and out_channels = K * in_channels where K is a positive integer is termed in literature as depthwise convolution.

You can find more information here.

thanks for you information!! but I do not think they’re same things.

From my perspective,

`group`

means to separate the channels.

For e.g., the input of the image is DFxDFxM, the output is DFxDFxN, the original convolution is: DKxDKxMxN

What I mean `Depthwise Separable Convolution`

can be divided into 2 parts:

part 1: Depthwise, the convolution of this part is DKxDKx1xM

part 2: Pointwise, the convolution of this part is 1x1xMxN

If the situation is like that, should I just use 2 Conv2d to achieve that?

Does anyone have some ideas?

I believe this answer is a more complete reply to your question.

If groups = nInputPlane, then it is Depthwise. If groups = nInputPlane, kernel=(K, 1), (and before is a Conv2d layer with groups=1 and kernel=(1, K)), then it is separable.

In short, you can achieve it using Conv2d, by setting the groups parameters of your convolutional layers. Hope it helps.

Here is a simple example:

```
class depthwise_separable_conv(nn.Module):
def __init__(self, nin, nout):
super(depthwise_separable_conv, self).__init__()
self.depthwise = nn.Conv2d(nin, nin, kernel_size=3, padding=1, groups=nin)
self.pointwise = nn.Conv2d(nin, nout, kernel_size=1)
def forward(self, x):
out = self.depthwise(x)
out = self.pointwise(out)
return out
```

For reference: see numpy implementations of depthwise convolutions vs. grouped convolutions to see how this works.

I am checking the shape of weights in a Conv2d layer, let’s say, the conv para is Conv2d(nin, 4*nin, groups=nin). The shape of weights in this layer is (4*nin, 1, 3, 3). Do you know how to interpret this shape?

(4 kernels for channel 1, 4 kernels for channel 2, …) or (1 kernel for channel 1, 1 kernel for channel 2, …, 1 kernel for channel 1, 1 kernel for channel 2, …)

Thanks,

I think the first understanding is right

Hi, in reply to @shicai 's example, this means you have only 1 kernel per layer, which is unlikely to be helpful.

Maybe something like this:

```
class depthwise_separable_conv(nn.Module):
def __init__(self, nin, kernels_per_layer, nout):
super(depthwise_separable_conv, self).__init__()
self.depthwise = nn.Conv2d(nin, nin * kernels_per_layer, kernel_size=3, padding=1, groups=nin)
self.pointwise = nn.Conv2d(nin * kernels_per_layer, nout, kernel_size=1)
def forward(self, x):
out = self.depthwise(x)
out = self.pointwise(out)
return out
```

You can just set group=nin/kernels_per_layer instead of nin*kernels_per_layer.

Can you help me in figuring out how do I modify the Depthwise convolution to full convolution? I have them as follows -

```
class ConvBNReLU(nn.Sequential):
def __init__(self, in_planes, out_planes, kernel_size, stride=1, groups=1):
padding = (kernel_size - 1) // 2
super(ConvBNReLU, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
nn.BatchNorm2d(out_planes),
nn.ReLU6(inplace=True)
)
```

```
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, kernel, stride, expand_ratio, res_connect):
super(InvertedResidual, self).__init__()
self.stride = stride
assert stride in [1, 2]
hidden_dim = int(round(inp * expand_ratio))
self.use_res_connect = res_connect
layers = []
if expand_ratio != 1:
# pw
layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
layers.extend([
# depth-wise
ConvBNReLU(hidden_dim, hidden_dim, kernel_size=kernel, stride=stride, groups=hidden_dim),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
nn.BatchNorm2d(oup),
])
self.conv = nn.Sequential(*layers)
def forward(self, x):
if self.use_res_connect:
return x + self.conv(x)
else:
return self.conv(x)
```

I’m not sure if I understand the question correctly, but if you don’t want to use the depthwise comvolution but a “vanilla” convolution, you could remove the `groups`

argument or set it to `1`

.