I am working with large images + DenseNet for a medical experiment. Unfortunately I have to scale the images down to fit them in Titan X memory. If I try full size images, of course the network runs out of GPU memory and stops.
I was wondering which of the following methods is more feasible in PyTorch to implement:
- Is there a way to shard a large image in multiple tiles across multiple GPUs? OR
- Is there something like “GPU SWAP” memory, where large image can reside in main memory and only portions being convolved get swapped in CUDA memory? Naturally this will be slower, but it is acceptable for my experiment purposes.
i can give you a solution, but it’s not for the faint-hearted, and not going to be as performant as single-GPU. are you interested?
Very interested! Tell me more.
@smth if you can share code or pseudo code, I can try to figure it out from there…
it’ll take me an hour to write-up, i’ll do it soon.
Hi Soumith, looking forward to your write up. A bunch of medical image researchers would be interested in your take on the image sharding/tiling problem.
i know i’m long overdue on this folks, i went into a rabbithole when i said an hour. I’m writing a whole library for this now.
Is it able to do with mini batch and use distributed training?
X-Ray, Tomography, Mammo, Pathology: all these image types don’t fit on a single GPU, so I tried to architect image sharding over a weekend. But soon realized it is a Google/FB-grade engineering challenge with tiling, overlapping convolutions, syncs, and much more.
So here I am like when @smth said he is working on a library. Tell us more!
PS: Hey @alexbellgrande good to see you here!
I’m also keen to see the end of this rabbit hole but don’t know if I have the chutzpah given @smth’s warning earlier.
Thanks for the shout @FuriouslyCurious! Been lurking for some time
Hi Soumith, how are you coming on this? Running into these issues as well.
hey everyone, you’ll probably hate me for dropping the ball on this. Sharding itself was harder than expected, I never finished it. Sharding for one layer was easy, but as you went across multiple layers, we need special data structures to deal with borders correctly.
Alternatively, here’s an approach that works well to fit large image based models and train them effectively: