I am working with large images + DenseNet for a medical experiment. Unfortunately I have to scale the images down to fit them in Titan X memory. If I try full size images, of course the network runs out of GPU memory and stops.
I was wondering which of the following methods is more feasible in PyTorch to implement:
Is there a way to shard a large image in multiple tiles across multiple GPUs? OR
Is there something like “GPU SWAP” memory, where large image can reside in main memory and only portions being convolved get swapped in CUDA memory? Naturally this will be slower, but it is acceptable for my experiment purposes.
Hi Soumith, looking forward to your write up. A bunch of medical image researchers would be interested in your take on the image sharding/tiling problem.
X-Ray, Tomography, Mammo, Pathology: all these image types don’t fit on a single GPU, so I tried to architect image sharding over a weekend. But soon realized it is a Google/FB-grade engineering challenge with tiling, overlapping convolutions, syncs, and much more.
So here I am like when @smth said he is working on a library. Tell us more!
hey everyone, you’ll probably hate me for dropping the ball on this. Sharding itself was harder than expected, I never finished it. Sharding for one layer was easy, but as you went across multiple layers, we need special data structures to deal with borders correctly.
Alternatively, here’s an approach that works well to fit large image based models and train them effectively: