When I use `torch.nn.Conv1d`

to perform pointwise convolution, it seems significantly slower than `torch.nn.Linear`

, while I assume these two operations should have similar speed.

**Update**: the following code and results are updated according to what @ptrblck suggested.

```
import torch
import time
torch.backends.cudnn.benchmark = True
def linear(x, times=1000):
m1 = torch.nn.Linear(512, 1024).cuda()
m2 = torch.nn.Linear(1024, 512).cuda()
torch.cuda.synchronize()
start = time.time()
for i in range(times):
h = m1(x)
y = m2(h)
torch.cuda.synchronize()
duration = (time.time() - start) / times
return duration
def conv1d(x, times=1000):
m1 = torch.nn.Conv1d(512, 1024, kernel_size=1, stride=1).cuda()
m2 = torch.nn.Conv1d(1024, 512, kernel_size=1, stride=1).cuda()
x = x.transpose(1, 2)
torch.cuda.synchronize()
start = time.time()
for i in range(times):
h = m1(x)
y = m2(h)
torch.cuda.synchronize()
duration = (time.time() - start) / times
return duration
if __name__ == '__main__':
# Time x Batch x Channel
x = torch.randn(50, 80, 512).cuda()
print(f'{linear.__name__}: {linear(x):.6f}s')
print(f'{conv1d.__name__}: {conv1d(x):.6f}s')
```

Ouput is:

linear: 0.001002s

conv1d: 0.001185s

I am using:

Hardware: Nvidia 1080Ti

Library: The latest pytorch built from source with Cuda8.0, cudnn6.0.