First time convolution slow

When I train multi-layers CNN, I found that the first time convolution operation is much slower than the followings, 0.3s vs 0.0003s. The testing conditions are:

Hardware: Nvidia 1080Ti
Library: The latest pytorch built from source with Cuda8.0, cudnn6.0.

Here is the test code and the result.

from __future__ import print_function
import torch
import torch.nn as nn
from torch.autograd import Variable
from time import time

class TestConv(nn.Module):
    def __init__(self):
        super(TestConv, self).__init__()
        self.conv1 = nn.Conv2d(32, 32, 3, 1)
        self.conv2 = nn.Conv2d(32, 32, 3, 1)

    def forward(self, x):
        t1 = time()
        x = self.conv1(x)
        t2 = time()
        x = self.conv2(x)
        t3 = time()
        print('Conv1 Time: {}\t Conv2 Time: {}'.format(t2 - t1, t3 - t2))
        return x

if __name__ == '__main__':
    x = Variable(torch.randn(1, 32, 224, 224).cuda(), requires_grad=False)
    cnns = TestConv()
    cnns.cuda()
    x = cnns(x)
Conv1 Time: 0.349282979965       Conv2 Time: 0.000304937362671

Because Conv1 is the first time you calculate on gpu.

if __name__ == '__main__':
    x = Variable(torch.randn(1, 32, 224, 224).cuda(), requires_grad=False)
    cnns = TestConv()
    cnns.cuda()
    for _ in range(100):  cnns(x)

you’ll find it.

Yes… I know. I just want to ask why this happen in PyTorch. Because MXNet doesn’t have such problem. I did’t test other frameworks yet. But it seems to be the unique problem when using pytorch.

I don’t know much about it either. I guess It’s something like cold start.