I am training a Transformer Encoder-Decoder based model for Text summarization. The code works without any errors but uses only 1 GPU when checked with nvidia-smi. However, I want to run it on all the available GPUs (I can access as many as I want). I wrapped my model in Dataparallel.
Here’s how I have wrapped the model:
if torch.cuda.device_count() > 1:
print(f"Using {torch.cuda.device_count()} GPUs")
model = nn.DataParallel(model)
model.to(device)
This is how I am calling the functions from my model:
if torch.cuda.device_count() > 1:
encoder_output = model.module.encode(
encoder_input, encoder_mask
) # (B, input_len, d_model)
decoder_output = model.module.decode(
encoder_output, encoder_mask, decoder_input, decoder_mask
) # (B, seq_len, d_model)
proj_output = model.module.project(
decoder_output
)
I am using Python 3.12 and Torch 2.3.0.
MWE is available on GitHub.
I have also checked if GPUs are configured correctly. Tried this example from PyTorch Documentation to test my GPUs configuration and working.