How to share weights of dilated convolutional kernels?

Hello all,

I would like to know how to share weights of dilated convolutional kernels. For example:

self.conv1 = nn.Conv2d(in_channels=2,out_channels=16,kernel_size=5, stride=1, padding=2, dilation=1)
self.conv2 = nn.Conv2d(in_channels=2,out_channels=16,kernel_size=5, stride=1, padding=2, dilation=2)
self.conv3 = nn.Conv2d(in_channels=2,out_channels=16,kernel_size=5, stride=1, padding=2, dilation=3)
self.conv4 = nn.Conv2d(in_channels=2,out_channels=16,kernel_size=5, stride=1, padding=2, dilation=4)

I would like to share weights between those convolutional layers.


Just define it once and call it as many times as you need.

Interesting problem!

While I do see some murmurs here and there about the ability to share weights between modules, it seems pretty frown upon.

I believe the better idea would be to use trainable parameters and make a functional call to conv2d using those trainable parameters.

Something like this would probably work for you:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Conv(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size):

        # Register parameters that are trainable
        self.weight = nn.Parameter(torch.randn(out_channels, in_channels, kernel_size, kernel_size))
        self.bias = nn.Parameter(torch.randn(out_channels))

    def forward(self, x, stride, padding, dilation):
        # Do a functional call so we can use the same weights but different arguments
        return F.conv2d(
            x, self.weight, bias=self.bias, stride=stride,
            padding=padding, dilation=dilation

# Example creation of module
conv = Conv(2, 16, 5)

# Example input
x = torch.randn((8, 2, 16, 16))

# Example usage with different dilation values
y1 = conv(x, 1, 2, 1)
y2 = conv(x, 1, 2, 2)
y3 = conv(x, 1, 2, 3)
print(y1.shape, y2.shape, y3.shape)

You probably want to consider initializing the weights differently however.

Hmmm I missed the part where he was using different dilations.
So yep @ayalaa2 's code is the way to go.

Hey! Thanks for the answer! It does make sense! Thank you very much! But I was wondering how the gradients are computed. Is it like a normal convolutional layer?

Yes, it should be as if it was a typical normal convolution layer. In fact, if you peek at the source code, you’ll notice that the pytorch convolution modules end up just calling their functional counterpart.