How to modify a Conv2d to Depthwise Separable Convolution?

should I just use 2 Conv2d to achieve that?

You would normally set the groups parameter of the Conv2d layer. From the docs:

The configuration when groups == in_channels and out_channels = K * in_channels where K is a positive integer is termed in literature as depthwise convolution.

You can find more information here.


thanks for you information!! but I do not think they’re same things.
From my perspective,
group means to separate the channels.
For e.g., the input of the image is DFxDFxM, the output is DFxDFxN, the original convolution is: DKxDKxMxN
What I mean Depthwise Separable Convolution can be divided into 2 parts:
part 1: Depthwise, the convolution of this part is DKxDKx1xM
part 2: Pointwise, the convolution of this part is 1x1xMxN

If the situation is like that, should I just use 2 Conv2d to achieve that?


Does anyone have some ideas?

I believe this answer is a more complete reply to your question.

If groups = nInputPlane, then it is Depthwise. If groups = nInputPlane, kernel=(K, 1), (and before is a Conv2d layer with groups=1 and kernel=(1, K)), then it is separable.

In short, you can achieve it using Conv2d, by setting the groups parameters of your convolutional layers. Hope it helps.


Here is a simple example:

class depthwise_separable_conv(nn.Module):
    def __init__(self, nin, nout):
        super(depthwise_separable_conv, self).__init__()
        self.depthwise = nn.Conv2d(nin, nin, kernel_size=3, padding=1, groups=nin)
        self.pointwise = nn.Conv2d(nin, nout, kernel_size=1)

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

For reference: see numpy implementations of depthwise convolutions vs. grouped convolutions to see how this works.

1 Like

I am checking the shape of weights in a Conv2d layer, let’s say, the conv para is Conv2d(nin, 4*nin, groups=nin). The shape of weights in this layer is (4*nin, 1, 3, 3). Do you know how to interpret this shape?
(4 kernels for channel 1, 4 kernels for channel 2, …) or (1 kernel for channel 1, 1 kernel for channel 2, …, 1 kernel for channel 1, 1 kernel for channel 2, …)


I think the first understanding is right

Hi, in reply to @shicai 's example, this means you have only 1 kernel per layer, which is unlikely to be helpful.

Maybe something like this:

class depthwise_separable_conv(nn.Module):
    def __init__(self, nin, kernels_per_layer, nout):
        super(depthwise_separable_conv, self).__init__()
        self.depthwise = nn.Conv2d(nin, nin * kernels_per_layer, kernel_size=3, padding=1, groups=nin)
        self.pointwise = nn.Conv2d(nin * kernels_per_layer, nout, kernel_size=1)

    def forward(self, x):
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

You can just set group=nin/kernels_per_layer instead of nin*kernels_per_layer.

Can you help me in figuring out how do I modify the Depthwise convolution to full convolution? I have them as follows -

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size, stride=1, groups=1):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),

class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, kernel, stride, expand_ratio, res_connect):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]
        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = res_connect
        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1))
            # depth-wise
            ConvBNReLU(hidden_dim, hidden_dim, kernel_size=kernel, stride=stride, groups=hidden_dim),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
            return self.conv(x)

I’m not sure if I understand the question correctly, but if you don’t want to use the depthwise comvolution but a “vanilla” convolution, you could remove the groups argument or set it to 1.


Hi @ptrblck , this thread is about nn.Conv2d, however in styleGAN, it used F.Conv2d, is it possible to implement separable convolution for F.Conv2d instead of nn.Conv2d? (like @James_Howard did ), thank you

Yes, that’s possible since internally the nn.Conv2d modules will also just call into the functional API F.conv2d. You would have to create the parameters (weight for the filters and bias) in the correct shapes and could then call F.conv2d with the right groups argument.

hi @ptrblck , thank you for your suggestion, yes, sometimes, we could do it according to your suggestion:

Because we don’t need to update the weight manually
however, for this one:

We have to do something additionally to the weight in the forward function before the convolution, so we have to use weight as one of the input parameter, that is not possible, because we could set weight for separable convolution class as it has 2 convs, could you please tell me how to do it? thank you so much

I don’t fully understand the issue you are mentioning. The linked code shows some scaling and reshaping of the weight parameter. Wouldn’t it be possible to apply the same logic to your custom depthwise conv layer?

Sorry for the confusion, yes, the code did some reshape and scaling on weight, thanks for the suggestion, let me try to apply same operation on my customized depthwise convolution, I just wondering how about the pointwise convolution?

I tried to modify my separable convolution as follow:

class dsc(nn.Module):
    def __init__(self, in_ch = 3, out_ch = 3, kernel_size=3, stride=1, padding=1, bias = True):
        super(dsc, self).__init__()
        self.depthwise = nn.Conv2d(in_ch, in_ch, kernel_size=kernel_size, stride = stride, padding = padding, bias= bias, groups=in_ch)
        self.pointwise = nn.Conv2d(in_ch, out_ch, kernel_size=1, bias=bias)

    def _initialize_weights(self):
        .........init code.............
    def forward(self, x, weight = None):
        if weight is not None:
            self.depthwise.weight = torch.nn.Parameter(weight)
        out = self.depthwise(x)
        out = self.pointwise(out)
        return out

I added weight to the forward function so I could easily replace the F.conv2d operations, but I got an error like:

Exception has occurred: RuntimeError
Given groups=3, expected weight to be divisible by 3 at dimension 0, but got weight of size [[64, 3, 1, 1]] instead

So I think the self.depthwise.weight = torch.nn.Parameter(weight) is wrong, but I didn’t figure out how to solve it

self.pointwise.weight = torch.nn.Parameter(weight) is fine :grinning:, but it maybe changed a lot, as there is no scaling on the depthwise