Load computations to gpu-memory one at a time

It seems that DL libraries load the whole model onto the gpu. To reduce memory gpu memory usage could I not load each matrix multiply into the gpu one at time?
I know there will be speed ineffiencies though. How would I go about this? Do I just manually call .cpu() or .cuda() after each operation?
Thanks.

Hi,

Yes the simplest way to do this would be to call .cuda() before the operation and .cpu() just after.
That being said, it is possible that this will be slower than running the whole thing on cpu, that will depend a lot on how much computation you have to perform on the gpu.

1 Like

Thanks :slight_smile: