I use PyTorch to train a lenet model, the network structure is identical to Caffe’s default lenet. Training 12 epochs takes about 28s in Caffe, but training 10 epochs takes over 100s in PyTorch.

My expirements are all run on Titan X, CUDA 8.0 and CUDNN v5. The training `bach_size`

is 64.

The final test accuracy is similar, but is seems that PyTorch is much slower than Caffe on lenet training.

Is this the real performance of PyTorch? Or may there be some details ignored by me like CUDNN?

I am new to PyTorch, welcome anyone to discuss on this post.

```
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 20, kernel_size=5)
self.conv2 = nn.Conv2d(20, 50, kernel_size=5)
self.fc1 = nn.Linear(800, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = F.max_pool2d(self.conv1(x), kernel_size=2, stride=2)
x = F.max_pool2d(self.conv2(x), kernel_size=2, stride=2)
x = x.view(-1, 800)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return F.log_softmax(x)
model = Net()
if args.cuda:
model.cuda()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9, weight_decay=0.0005)
# ignore train code
```