Pointwise Conv1d slower than Linear

When I use torch.nn.Conv1d to perform pointwise convolution, it seems significantly slower than torch.nn.Linear, while I assume these two operations should have similar speed.

Update: the following code and results are updated according to what @ptrblck suggested.

import torch
import time

torch.backends.cudnn.benchmark = True


def linear(x, times=1000):
    m1 = torch.nn.Linear(512, 1024).cuda()
    m2 = torch.nn.Linear(1024, 512).cuda()
    torch.cuda.synchronize()
    start = time.time()
    for i in range(times):
        h = m1(x)
        y = m2(h)
    torch.cuda.synchronize()
    duration = (time.time() - start) / times
    return duration

def conv1d(x, times=1000):
    m1 = torch.nn.Conv1d(512, 1024, kernel_size=1, stride=1).cuda()
    m2 = torch.nn.Conv1d(1024, 512, kernel_size=1, stride=1).cuda()
    x = x.transpose(1, 2)
    torch.cuda.synchronize()
    start = time.time()
    for i in range(times):
        h = m1(x)
        y = m2(h)
    torch.cuda.synchronize()
    duration = (time.time() - start) / times
    return duration


if __name__ == '__main__':
    # Time x Batch x Channel
    x = torch.randn(50, 80, 512).cuda()
    print(f'{linear.__name__}: {linear(x):.6f}s')
    print(f'{conv1d.__name__}: {conv1d(x):.6f}s')

Ouput is:

linear: 0.001002s
conv1d: 0.001185s

I am using:

Hardware: Nvidia 1080Ti
Library: The latest pytorch built from source with Cuda8.0, cudnn6.0.

Could you add a synchronization before the start of the timer?
Note that even though the operations should be similar, cuDNN etc. might chose specific algorithms which might be faster or slower than the counterpart.
Also, try to use torch.backends.cudnn.benchmark = True and let the operations run a few times before timing them.

Thanks! I already rerun the experiment according to your advice, but the result still suggests that Conv1d is about 20% slower than Linear.