I am working on graph similarity prediction using the SimGNN model. Since SimGNN requires input as pairs of graphs, I cannot use PyTorch’s DataLoader to batch multiple graphs together efficiently. As a result, my GPU utilization is only around 10% per GPU, and I am using 4 GPUs for multi-GPU training.
To improve GPU utilization, I attempted to run multiple processes in parallel on each GPU, so that each GPU could train multiple pairs of graphs at the same time. However, I encountered the following issues:
- I tried using PyTorch’s multiprocessing, but due to the DDP (Distributed Data Parallel) environment, each process cannot properly communicate during backpropagation.
- Running multiple processes on each GPU seems to conflict with PyTorch’s DDP, preventing inter-process communication across GPUs.
My Goal:
I want to launch multiple processes per GPU, so that each GPU can efficiently process multiple graph pairs in parallel and achieve higher utilization.
Are there any efficient ways to train multiple pairs of graphs in parallel on each GPU? Alternatively, are there other ways to improve GPU utilization in this scenario?