Hi,

I found that under the same condition, using dilated convolution in the residual unit will cause several times the training time than the original convolution.

What I did is the following:

Replace

```
class Block(nn.Module):
def __init__(self, act=nn.ReLU(True)):
super(Block, self).__init__()
self.conv1 = nn.Conv2d(64,64, 5, padding=2, dilation=1)
self.conv2 = nn.Conv2d(64,64, 5, padding=2, dilation=1)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
res = self.conv1(x)
res2 = self.relu(res)
res = self.conv2(res)
res += x
return res
```

by

```
class Block(nn.Module):
def __init__(self, act=nn.ReLU(True)):
super(Block, self).__init__()
self.conv1 = nn.Conv2d(64,64, 3, padding=2, dilation=2)
self.conv2 = nn.Conv2d(64,64, 3, padding=2, dilation=2)
self.relu = nn.ReLU(inplace=True)
def forward(self, x):
res = self.conv1(x)
res2 = self.relu(res)
res = self.conv2(res)
res += x
return res
```

In fact, under the same receptive filed, using the dilated convolution will reduce the amount of parameters, but why does it lead to a significant increase in training time?

Could you tell me what is the reason and how to solve the problem?