Runtime dependence on hidden nodes


I am training a single-hidden-layer perceptron on the CIFAR10 sets. My network thus has 32*32 input nodes (I turned the images to greyscale), N hidden nodes, and 10 output nodes.
For a fix learning rate and number of training epochs, I found that my runtime barely depends on the number N of hidden nodes. E.g. N=200 gives around 5 minutes runtime, N=10 still almost 5 minutes.
How is this possible? Decreasing the number N by a factor of 20 also drastically reduces the number of learnable parameters. Does it have anything to do with the way PyTorch applies its linear layers?


The actual model workload is tiny, so that changing the size of this single layer might not be directly visible in the end2end training time.
You would have to properly profile the code parts to determine how much time is actually spent on the training itself.
The current runtime might be determined largely by the data loading and processing time or even the dispatching or kernel launch overheads.

1 Like

Interesting! Didn’t consider memory issues as I am usually not focussing on images. Cheers!