Greetings!
I am training a single-hidden-layer perceptron on the CIFAR10 sets. My network thus has 32*32 input nodes (I turned the images to greyscale), N hidden nodes, and 10 output nodes.
For a fix learning rate and number of training epochs, I found that my runtime barely depends on the number N of hidden nodes. E.g. N=200 gives around 5 minutes runtime, N=10 still almost 5 minutes.
How is this possible? Decreasing the number N by a factor of 20 also drastically reduces the number of learnable parameters. Does it have anything to do with the way PyTorch applies its linear layers?
Best
PiF