# 1D Kernel vs 1D Conv speed

I was curious why it’s a bit slower to run 1D kernel using conv3D(k,1,1) versus Conv1D(k) and reshaping the input.

For example, if I define conv1,2,3 as just kernel:(k) the below is around 5x faster:

``````    def forward(self, x):
# Tensor is NxCxTxHxW
b,c,t,h,w = x.size()
# Temporal axis
conv0 = self.conv1(
x.view(b,c,t*h*w)).view(b,c,t,h,w)
# Height axis
conv1 = self.conv2(
x.permute(0,1,3,4,2).reshape(b,c,h*w*t)).view(b,c,h,w,t).permute(0,1,4,2,3)
# Width axis
conv2 = self.conv3(
x.permute(0,1,4,3,2).reshape(b,c,w*h*t)).view(b,c,w,h,t).permute(0,1,4,3,2)
# Out
out = self.relu(conv0) + self.relu(conv1) + self.relu(conv2)
return out
``````

Than using:

``````        # Temporal
self.conv0 = nn.Conv3d(
in_channels,
out_channels,
kernel_size=(kernel_size[0], 1, 1),
stride=(1, 1, 1),
groups=groups,
bias=bias)

# Height
self.conv1 = nn.Conv3d(
in_channels,
out_channels,
kernel_size=(1, kernel_size[1], 1),
stride=(1, 1, 1),
groups=groups,
bias=bias)

# Width
self.conv2 = nn.Conv3d(
in_channels,
out_channels,
kernel_size=(1, 1, kernel_size[2]),
stride=(1, 1, 1),
groups=groups,
bias=bias)

def forward(self, x):
t0, t1, t2 = self.conv0(x), self.conv1(x), self.conv2(x)
out = self.relu(t0) + self.relu(t1) + self.relu(t2)
return out

``````

And as far as I’m aware the resulting operation should be the same?

Did you synchronize the code before starting and stopping the timer?
If so, could you disable cudnn via `torch.backends.cudnn.enabled = False` and profile the code again, please?