PyTorch Not Automatically Utilizing Multiple GPUs

Sai_Yaaminie_Ganda · February 12, 2025, 11:53pm

I am working on the Lambda Vector Machine, which has 2 NVIDIA RTX 600 Ada GPUs. However, PyTorch is using only 1 GPU automatically, and I have to manually split the workload to run on both GPUs.

I expected PyTorch to automatically parallelize computations across both GPUs using DataParallel() or DistributedDataParallel(), but the workload remains restricted to a single GPU.

Are there any specific system configurations or PyTorch settings I need to adjust?

Thank you.

ptrblck · February 12, 2025, 11:54pm

The DDP Tutorial shows how multiple GPUs can be used in DDP. Did you stick to this example or how do you execute your script?

Sai_Yaaminie_Ganda · February 13, 2025, 4:12pm

Thank you for sharing the resources. I haven’t used Distributed Data Parallel (DDP) for machine learning before. I was trying to parallelize matrix multiplication for very large matrices (N > 10,000 ). For this, I defined the operation as a model(N, N) and then passed it through torch.nn.DataParallel()

ptrblck · February 13, 2025, 4:51pm

DistributedDataParallel will not shard matmuls or other operations. It will parallelize the workload by cloning the model to each (rank) and will execute the forward/backward passes in parallel on each rank using different data samples.
If you want to apply model sharding approaches you might want to check e.g. Tensor Parallelism.

Sai_Yaaminie_Ganda · February 13, 2025, 5:07pm

Thank you for the clarification. I went through the documentation and had one question - does Tensor Parallelism automatically synchronize all computations in the correct order, or do I need to manually handle synchronization to prevent race conditions or stale data?