Why dose the dilated convolution cause several times the training time than the original convolution?

Hi,
I found that under the same condition, using dilated convolution in the residual unit will cause several times the training time than the original convolution.
What I did is the following:

Replace

class Block(nn.Module):
    def __init__(self, act=nn.ReLU(True)):
        super(Block, self).__init__()
        self.conv1 = nn.Conv2d(64,64, 5, padding=2, dilation=1)
        self.conv2 = nn.Conv2d(64,64, 5, padding=2, dilation=1)
        self.relu  = nn.ReLU(inplace=True)
    def forward(self, x):
        res = self.conv1(x)
        res2 = self.relu(res)
        res = self.conv2(res)
        res += x
        return res

by

class Block(nn.Module):
    def __init__(self, act=nn.ReLU(True)):
        super(Block, self).__init__()
        self.conv1 = nn.Conv2d(64,64, 3, padding=2, dilation=2)
        self.conv2 = nn.Conv2d(64,64, 3, padding=2, dilation=2)
        self.relu  = nn.ReLU(inplace=True)
    def forward(self, x):
        res = self.conv1(x)
        res2 = self.relu(res)
        res = self.conv2(res)
        res += x
        return res

In fact, under the same receptive filed, using the dilated convolution will reduce the amount of parameters, but why does it lead to a significant increase in training time?
Could you tell me what is the reason and how to solve the problem?