MNIST training: PyTorch v.s. Caffe

I use PyTorch to train a lenet model, the network structure is identical to Caffe’s default lenet. Training 12 epochs takes about 28s in Caffe, but training 10 epochs takes over 100s in PyTorch.

My expirements are all run on Titan X, CUDA 8.0 and CUDNN v5. The training bach_size is 64.

The final test accuracy is similar, but is seems that PyTorch is much slower than Caffe on lenet training.
Is this the real performance of PyTorch? Or may there be some details ignored by me like CUDNN?

I am new to PyTorch, welcome anyone to discuss on this post.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, kernel_size=5)
        self.conv2 = nn.Conv2d(20, 50, kernel_size=5)
        self.fc1 = nn.Linear(800, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.max_pool2d(self.conv1(x), kernel_size=2, stride=2)
        x = F.max_pool2d(self.conv2(x), kernel_size=2, stride=2)
        x = x.view(-1, 800)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x)

model = Net()
if args.cuda:
    model.cuda()

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0005)

# ignore train code

You can’t “ignore train code”, because it’s what decides how fast will your script run. In our benchmarks we don’t see large discrepancies between PyTorch and Caffe. One thing that might make it faster would be to set torch.backends.cudnn.benchmark = True

Thanks for your reply. I think I have found the reason, it’s because the data loader is a little slow, especially the transform functions. The training time is the same whether torch.backends.cudnn.benchmark is True or False, PyTorch may call cudnn automatically.
As your replay, PyTorch and Caffe has no large discrepancies, are there any more details or links?