GPU memory usage

HI all,

I have a question about Pytorch GPU memory usage. I am playing with a couple of different architectures for my fully connected network, each of which have n hidden units in each hidden layer, though the number of hidden layers varies. I was expecting the memory footprint for each of the models to be roughly the same during inference (no backprop). The model itself is quite small and does not take up much GPU memory, but, my deeper networks seem to use much more VRAM. Given that they have the same number of hidden units the matrix multiplication operations are done on matrices of the same size, I was expecting all the models to use roughly the same amount of VRAM. What am I missing here? Why are the deeper networks need more memory to train?

Thanks in advance!

Could you give an example of a model, which uses little memory and another one of a model which uses more memory?
I’m not sure to understand the question completely, but since each layer will use its own parameters, the more layers you are creating the more memory will be used.