Is Large Image Sharding possible?

FuriouslyCurious · September 19, 2017, 3:49am

Hi everyone,

I am working with large images + DenseNet for a medical experiment. Unfortunately I have to scale the images down to fit them in Titan X memory. If I try full size images, of course the network runs out of GPU memory and stops.

I was wondering which of the following methods is more feasible in PyTorch to implement:

Is there a way to shard a large image in multiple tiles across multiple GPUs? OR
Is there something like “GPU SWAP” memory, where large image can reside in main memory and only portions being convolved get swapped in CUDA memory? Naturally this will be slower, but it is acceptable for my experiment purposes.

Thank you!

smth · September 19, 2017, 4:06am

i can give you a solution, but it’s not for the faint-hearted, and not going to be as performant as single-GPU. are you interested?

FuriouslyCurious · September 19, 2017, 4:51am

Very interested! Tell me more.

FuriouslyCurious · September 20, 2017, 8:00pm

@smth if you can share code or pseudo code, I can try to figure it out from there…

smth · September 20, 2017, 8:01pm

it’ll take me an hour to write-up, i’ll do it soon.

FuriouslyCurious · September 28, 2017, 3:14pm

Hi Soumith, looking forward to your write up. A bunch of medical image researchers would be interested in your take on the image sharding/tiling problem.

smth · October 6, 2017, 3:43pm

i know i’m long overdue on this folks, i went into a rabbithole when i said an hour. I’m writing a whole library for this now.

russellwmy · October 6, 2017, 6:17pm

Is it able to do with mini batch and use distributed training?

FuriouslyCurious · October 7, 2017, 5:13am

X-Ray, Tomography, Mammo, Pathology: all these image types don’t fit on a single GPU, so I tried to architect image sharding over a weekend. But soon realized it is a Google/FB-grade engineering challenge with tiling, overlapping convolutions, syncs, and much more.

So here I am like when @smth said he is working on a library. Tell us more!

PS: Hey @alexbellgrande good to see you here!

alexbellgrande · October 7, 2017, 11:37am

I’m also keen to see the end of this rabbit hole but don’t know if I have the chutzpah given @smth’s warning earlier.

Thanks for the shout @FuriouslyCurious! Been lurking for some time

obeavers · November 7, 2017, 6:30pm

Hi Soumith, how are you coming on this? Running into these issues as well.

Thx.

smth · January 10, 2018, 11:44pm

hey everyone, you’ll probably hate me for dropping the ball on this. Sharding itself was harder than expected, I never finished it. Sharding for one layer was easy, but as you went across multiple layers, we need special data structures to deal with borders correctly.

Alternatively, here’s an approach that works well to fit large image based models and train them effectively: