Correct GPU memory allocation


I have a server equipped with 3 Quadro P2000 and one GTX 1050. Quadros have 5GB of video memory and GTX has 4GB. I use DataParallel for multi gpu processing. But i notised, that batch size limited to 4gb and when i try to increase batch size I catch OOM on GTX.
For example if i use batch size 50 GTX memory is full, but each Quadro use only 4GBs instead of 5GB. So I got 3GB of unused GPU Ram in total. And when I increase batch size to 56 I catch OOM beacuse the GTX memory is full.

So the question is: What is the correct way to use all available gpu memory

I use this code to parallalize training

net = model()
    if torch.cuda.device_count() > 1:
        net = torch.nn.DataParallel(net, device_ids=list(range(torch.cuda.device_count())))'cuda')

Ps. GTX index is 0

Thank you for your support !!!

I believe DataParallel assumes identical or very similar GPUs give the _check_balance warning here:
torch.nn.parallel.data_parallel — PyTorch 1.10.0 documentation

However, you may want to check that using the 1050 is beneficial from a load-balancing perspective to begin with as it may be bottlenecking training compared to the P2000s even when accounting for the difference in available memory.

Also take a look at this post:

Seems like Distributed Data Parallel is the solution. Thanks a lot!!! I’ll try this

Thank you for your reply! It looks like regular dataParallel is not the best way for multi gpu training. Thank you!