PyTorch Not Automatically Utilizing Multiple GPUs

I am working on the Lambda Vector Machine, which has 2 NVIDIA RTX 600 Ada GPUs. However, PyTorch is using only 1 GPU automatically, and I have to manually split the workload to run on both GPUs.

I expected PyTorch to automatically parallelize computations across both GPUs using DataParallel() or DistributedDataParallel(), but the workload remains restricted to a single GPU.

Are there any specific system configurations or PyTorch settings I need to adjust?

Thank you.

The DDP Tutorial shows how multiple GPUs can be used in DDP. Did you stick to this example or how do you execute your script?

Thank you for sharing the resources. I haven’t used Distributed Data Parallel (DDP) for machine learning before. I was trying to parallelize matrix multiplication for very large matrices (N > 10,000 ). For this, I defined the operation as a model(N, N) and then passed it through torch.nn.DataParallel()

DistributedDataParallel will not shard matmuls or other operations. It will parallelize the workload by cloning the model to each (rank) and will execute the forward/backward passes in parallel on each rank using different data samples.
If you want to apply model sharding approaches you might want to check e.g. Tensor Parallelism.

Thank you for the clarification. I went through the documentation and had one question - does Tensor Parallelism automatically synchronize all computations in the correct order, or do I need to manually handle synchronization to prevent race conditions or stale data?