Dilated Conv2d training is very slow

Hi! Whenever I add convolutional layers with dilation > 1, my training slows down up to 5 times. The same happens with GPU utilization: it usually stays under 20% vs 95+% without dilated convs. Using dense kernels of the same size (e. g. 5x5 instead of 3x3 with dilation = 2) eliminates the effect. torch.utils.bottleneck shows that dilated convs are done with slow_conv_dilated2d.

Is this slowdown an expected behavior, or have I configured something wrong?

System configuration:
OS: Ubuntu 16.04.5
Kernel: 4.15.0-91-generic
NVidia driver: 418.87.01
CUDA: 10.0.130
Pytorch: 1.4.0, installed with pip from repo in instruction

1 Like

Could you rerun your code with torch.backends.cudnn.benchmark = True and check, if you are seeing a speedup?
This setting will profile different cudnn kernls for your current workload and will select the fastest one.
Also, could you post the definition of your dilated convolutions, please?

With benchmark = True time stays the same
Iā€™m not sure what you mean by definition. I use nn.Conv2d(..., dilation=2) inside a nn.Sequential

What are the other arguments you are passing the the conv to create it (kernel_size etc.)?

Here is a more complete fragment

def _conv_pad_size(k, d=1):
    return (k - 1) // 2 + k // 2 * (d - 1)

class ConvBlock(nn.Sequential):
    def __init__(self, in_, out, kernel_size=3, dilation=1, pool=1):
        padding = _conv_pad_size(kernel_size, dilation)
        super().__init__(
            nn.Conv2d(in_, out, kernel_size, padding=padding, dilation=dilation),
            nn.BatchNorm2d(out),
            nn.MaxPool2d(pool),
            nn.ReLU(inplace=True)
        )

I use a series of blocks with increasing number of channels. All blocks have dilation=2, kernel_size=3

Could you give some pointers where to investigate further?
UPD: oh. The previous message was not a reply, my bad