DataParallel out of memory

mld284 · June 29, 2017, 6:53pm

Hi Pytorch,
Are there best practices for avoiding out of memory errors when using dataparallel. For instance, I have 8 cards, but I think the gradients are all accumulated on one card (card 0 by default), so would it make sense to use 7 cards, and set the 8th one as the output device? Or would that be a waste? I’ve tried experimenting with it but the overhead is so high I’m not sure i’d get there quickly.

colesbury · June 30, 2017, 8:24pm

Using a card solely for the to accumulate gradients doesn’t sound so good. Whats your model look like?

In many models the activations use more memory than the weights and gradients, so try decreasing your batch size if you’re running out of memory.

yj_z · September 7, 2018, 9:04am

Did you finally solve this problem? And how to solve this? THank you.