siaimes
(siaimes)
September 15, 2021, 4:22am
1
I used four RTX3090 graphics cards to train the resnet18 model, but I don’t know why the utilization of one of the graphics cards is always 0.
Following this doc https://github.com/pytorch/examples/tree/master/imagenet to run the following command:
python main.py --arch resnet18 --multiprocessing-distributed --world-size 1 --rank 0 --batch-size 1024 --epochs 90
The GPU utilization results recorded by Grafana are as follows:
gphilip
(G Philip)
September 15, 2021, 4:37am
2
I am not familiar with the application that you used to make these graphs, but to me the gray box in the image seems to be reporting the usage at a particular instant (21:06:04), not for all the time. Is this not so?
siaimes
(siaimes)
September 15, 2021, 7:56am
3
My statement may not be appropriate, but you can view the green curve, which is almost always at the bottom.