Why is there a GPU with a utilization rate of 0 when running pytorch's DDP task?

I used four RTX3090 graphics cards to train the resnet18 model, but I don’t know why the utilization of one of the graphics cards is always 0.

Following this doc https://github.com/pytorch/examples/tree/master/imagenet to run the following command:

python main.py --arch resnet18 --multiprocessing-distributed --world-size 1 --rank 0 --batch-size 1024 --epochs 90

The GPU utilization results recorded by Grafana are as follows:

image

I am not familiar with the application that you used to make these graphs, but to me the gray box in the image seems to be reporting the usage at a particular instant (21:06:04), not for all the time. Is this not so?

My statement may not be appropriate, but you can view the green curve, which is almost always at the bottom.