Pytorch Training Time

Hi All,

I was training a VGG-16 on CIFAR10 image classification task with GTX1080.

However, I discovered a rather weird phenomenon: The original VGG-16 has convolution filter size: [64, 128, 256, 512, 512] which takes roughly 35s per epoch in training time. However, when I reduced those filter numbers into [32, 64, 128, 256, 256], [16, 32, 64, 128, 128], [8, 16, 32, 64, 64] They all only take 10 seconds per epoch.

I thought intuitively, the training time should reduce when the network has fewer parameters? Is this a common situation?


The timing might be masked by another bottleneck in your code.
The training procedure on the GPU might be fast, but the overall training routine has to wait e.g. for the DataLoader to provide a new batch.

I understand that argument. However, each implementation and the code are the same while the only difference is the filter number which is defined in the network class. Maybe there is timing mask, so it seemed to like a step function rather than a continuous function.