Hi,
I need to perform inference using the same model on multiple GPUs inside a Docker container. I want to run inference on multiple input data samples simultaneously across different GPUs within the Docker environment, so each GPU is processing a different input batch in parallel using the same model.
However, it seems that the GPUs are not being fully utilized. While one GPU is processing, the other seems to be idle. I would like to ensure that both GPUs are being used simultaneously for inference and that the workload is distributed effectively.
To clarify, my goal is to:
- Replicate the same model across multiple GPUs.
- Distribute input batches across GPUs in parallel.
- Ensure both GPUs are actively processing during inference without one resting.
Any help or guidance on how to best set this up using PyTorch inside Docker would be greatly appreciated!