When I use mixed precision training, the GPU’s utilization has reduced a lot, like below:
Thu Oct 8 23:42:03 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:04:00.0 Off | N/A |
| 51% 53C P2 116W / 250W | 8990MiB / 11019MiB | 78% Default |
+-------------------------------+----------------------+----------------------+
| 1 GeForce RTX 208... Off | 00000000:05:00.0 Off | N/A |
| 58% 56C P2 201W / 250W | 8990MiB / 11019MiB | 80% Default |
+-------------------------------+----------------------+----------------------+
| 2 GeForce RTX 208... Off | 00000000:08:00.0 Off | N/A |
| 58% 56C P2 151W / 250W | 8990MiB / 11019MiB | 79% Default |
+-------------------------------+----------------------+----------------------+
| 3 GeForce RTX 208... Off | 00000000:09:00.0 Off | N/A |
| 58% 56C P2 108W / 250W | 8990MiB / 11019MiB | 75% Default |
+-------------------------------+----------------------+----------------------+
| 4 GeForce RTX 208... Off | 00000000:84:00.0 Off | N/A |
| 59% 56C P2 150W / 250W | 8990MiB / 11019MiB | 77% Default |
+-------------------------------+----------------------+----------------------+
| 5 GeForce RTX 208... Off | 00000000:85:00.0 Off | N/A |
| 57% 56C P2 102W / 250W | 8990MiB / 11019MiB | 81% Default |
+-------------------------------+----------------------+----------------------+
| 6 GeForce RTX 208... Off | 00000000:88:00.0 Off | N/A |
| 53% 54C P2 163W / 250W | 8990MiB / 11019MiB | 76% Default |
+-------------------------------+----------------------+----------------------+
| 7 GeForce RTX 208... Off | 00000000:89:00.0 Off | N/A |
| 61% 57C P2 141W / 250W | 8990MiB / 11019MiB | 72% Default |
+-------------------------------+----------------------+----------------------+
But with fp32, it was almost nearly 100%. What wrong happens? My environments are:
In [1]: import torch
In [2]: torch.__version__
Out[2]: '1.6.0'
In [3]: torch.version.cuda
Out[3]: '10.2'
In [4]: torch.backends.cudnn.version()
Out[4]: 7605