I am doing some GAN research and I am running into a problem with memory efficiency.
My both networks together have a size of around 50 MB. However, training with batches of size 3 already uses all of my GPU memory, i.e. 12 GB. The input consists of 512 x 512 images concatenated with some binary masks. I am aware that autograd needs to keep track of additional things for the backward pass, but it just seems a little bit too much for me.
I couldn’t find any useful memory efficiency tips, so any advice would be appreciated.
The problem is that you need to do backprop, and the amount of memory needed isn’t negligible. Often, you need the whole activation to be stored, and this is what’s memory is spent on.