How to allocate different memories to multiple gpus while training?

Suppose I have two GPUs, GPU-0 and GPU-1 (they are the same type). I hope to train a simple classification network (e.g. ResNet) on them. For some special reasons, I hope GPU-0 can take more memories.

For example, consider the batch size set to 64, I hope about 40 batches of data are allocated on GPU-0 and the rest 24 batches on GPU-1.

I am guessing this can not be done via nn.DataParallel or nn.DistributedDataParallel, right? To do this, I think I need to copy the model and data manually to GPU-0 and GPU-1, then merge the computed loss together.

I am pretty unfamiliar with distributional training in PyTorch and fail to find a proper tutorial. A related question is raised here, however the objective is quite different.

Could anyone illustrate this problem with an example? Thanks ahead.

I would guess DistributedDataParallel might work for such a use case since each process is defining it’s own data loading pipeline.
You might not be able to use the DistributedSampler, but could try to load the desired dataset subsets manually with different batch sizes. Let me know, if this would work for you.