Hi All,
I was training a VGG-16 on CIFAR10 image classification task with GTX1080.
However, I discovered a rather weird phenomenon: The original VGG-16 has convolution filter size: [64, 128, 256, 512, 512] which takes roughly 35s per epoch in training time. However, when I reduced those filter numbers into [32, 64, 128, 256, 256], [16, 32, 64, 128, 128], [8, 16, 32, 64, 64] They all only take 10 seconds per epoch.
I thought intuitively, the training time should reduce when the network has fewer parameters? Is this a common situation?
Thanks.