Hi! Whenever I add convolutional layers with dilation > 1, my training slows down up to 5 times. The same happens with GPU utilization: it usually stays under 20% vs 95+% without dilated convs. Using dense kernels of the same size (e. g. 5x5 instead of 3x3 with dilation = 2) eliminates the effect. torch.utils.bottleneck shows that dilated convs are done with slow_conv_dilated2d.
Is this slowdown an expected behavior, or have I configured something wrong?
System configuration:
OS: Ubuntu 16.04.5
Kernel: 4.15.0-91-generic
NVidia driver: 418.87.01
CUDA: 10.0.130
Pytorch: 1.4.0, installed with pip from repo in instruction
Could you rerun your code with torch.backends.cudnn.benchmark = True and check, if you are seeing a speedup?
This setting will profile different cudnn kernls for your current workload and will select the fastest one.
Also, could you post the definition of your dilated convolutions, please?