Runtime dependence on hidden nodes

PhysicsIsFun · April 9, 2021, 3:46pm

Greetings!

I am training a single-hidden-layer perceptron on the CIFAR10 sets. My network thus has 32*32 input nodes (I turned the images to greyscale), N hidden nodes, and 10 output nodes.
For a fix learning rate and number of training epochs, I found that my runtime barely depends on the number N of hidden nodes. E.g. N=200 gives around 5 minutes runtime, N=10 still almost 5 minutes.
How is this possible? Decreasing the number N by a factor of 20 also drastically reduces the number of learnable parameters. Does it have anything to do with the way PyTorch applies its linear layers?

Best
PiF

ptrblck · April 10, 2021, 6:43am

The actual model workload is tiny, so that changing the size of this single layer might not be directly visible in the end2end training time.
You would have to properly profile the code parts to determine how much time is actually spent on the training itself.
The current runtime might be determined largely by the data loading and processing time or even the dispatching or kernel launch overheads.

PhysicsIsFun · April 11, 2021, 5:06pm

Interesting! Didn’t consider memory issues as I am usually not focussing on images. Cheers!