nn.DistributedDataParallelizm with batch_size = 1?


I am using a Multiple Instance Learning model, where I have the so-called bags that comprise a significant fraction of images with big sizes e.g batch_size, nr_patches, nr_channels, height, width = [(1, 900, 3, 598, 598)]. These patches share the same label and are considered one input/image. The nr_patches is different between bags, due to memory constraints, and model selection on how the patches are aggregated - I can only have a batch size of 1.

I wanted to use nn.DistributedDataParallel to parallelize computations of 1 batch to multiple GPUs. Is that possible? Does the nn.DistributedDataParallel parallelizes mini-batches across multiple GPUs even if batch_size =1. I am really hoping for a response form your side.

In the standard DDP setup (single process per GPU) each process is responsible to load and process the input batch. To avoid loading duplicated samples, the DistributedSampler is used.
You could set the batch size to 1 in each worker and load different “patches” instead. However, you would need to implement this logic into the Dataset and its __getitem__ method as well as to the sampler, to avoid loading the same patches.
In the default use case the Dataset.__getitem__ receives the index from the sampler and is responsible to load and process one sample using the passed index. The collate_fn then stacks the different samples to an input batch.
In your use case you might need to pass two indices to the __getitem__ method (using a custom sampler) since now you have to check which sample to load and from this sample which patches.
I might miss some details, so let me know if this approach makes sense to you.