Unable to run code on Multiple GPUs in PyTorch - Usage shows only 1 GPU is being utilized

I am training a Transformer Encoder-Decoder based model for Text summarization. The code works without any errors but uses only 1 GPU when checked with nvidia-smi. However, I want to run it on all the available GPUs (I can access as many as I want). I wrapped my model in Dataparallel.
Here’s how I have wrapped the model:

    if torch.cuda.device_count() > 1:
        print(f"Using {torch.cuda.device_count()} GPUs")
        model = nn.DataParallel(model)

    model.to(device)

This is how I am calling the functions from my model:

            if torch.cuda.device_count() > 1:
                encoder_output = model.module.encode(
                    encoder_input, encoder_mask
                )  # (B, input_len, d_model)
                decoder_output = model.module.decode(
                    encoder_output, encoder_mask, decoder_input, decoder_mask
                )  # (B, seq_len, d_model)
                proj_output = model.module.project(
                    decoder_output
                ) 

I am using Python 3.12 and Torch 2.3.0.

MWE is available on GitHub.

I have also checked if GPUs are configured correctly. Tried this example from PyTorch Documentation to test my GPUs configuration and working.

In your provided code snippets you are using the deprecated nn.DataParallel module first, but are then skipping it by accessing the internal .module.
Call the model directly and let DataParallel handle the data splits as well as model copies before it calls the forward.

1 Like

OMG. That was horrible part to miss. Using the forward() resolved the issue.