Hi all, I found logger.info
will print additional info when using Distribute training (DDP).
When using pytorch version of 1.6.Calling logger.info
may give:
Epoch 1, Node 0, GPU 3, Iter 300, Top1 Accuracy:0.083602, Loss:5.4438, 132 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 6, Iter 300, Top1 Accuracy:0.086197, Loss:5.4218, 129 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 1, Iter 300, Top1 Accuracy:0.084302, Loss:5.4295, 86 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 2, Iter 300, Top1 Accuracy:0.083498, Loss:5.4405, 130 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 0, Iter 300, Top1 Accuracy:0.087469, Loss:5.4242, 131 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 5, Iter 300, Top1 Accuracy:0.083731, Loss:5.4297, 127 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 7, Iter 300, Top1 Accuracy:0.0856, Loss:5.4234, 130 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 4, Iter 300, Top1 Accuracy:0.08355, Loss:5.4322, 86 samples/s. lr: 0.64553.
But when I update pytorch version to 1.7
Epoch 1, Node 0, GPU 3, Iter 300, Top1 Accuracy:0.083602, Loss:5.4438, 132 samples/s. lr: 0.64553.
INFO:Distribute training logs.:Epoch 1, Node 0, GPU 3, Iter 300, Top1 Accuracy:0.083602, Loss:5.4438, 132 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 6, Iter 300, Top1 Accuracy:0.086197, Loss:5.4218, 129 samples/s. lr: 0.64553.
INFO:Distribute training logs.:Epoch 1, Node 0, GPU 6, Iter 300, Top1 Accuracy:0.086197, Loss:5.4218, 129 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 1, Iter 300, Top1 Accuracy:0.084302, Loss:5.4295, 86 samples/s. lr: 0.64553.
INFO:Distribute training logs.:Epoch 1, Node 0, GPU 1, Iter 300, Top1 Accuracy:0.084302, Loss:5.4295, 86 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 2, Iter 300, Top1 Accuracy:0.083498, Loss:5.4405, 130 samples/s. lr: 0.64553.
INFO:Distribute training logs.:Epoch 1, Node 0, GPU 2, Iter 300, Top1 Accuracy:0.083498, Loss:5.4405, 130 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 0, Iter 300, Top1 Accuracy:0.087469, Loss:5.4242, 131 samples/s. lr: 0.64553.
INFO:Distribute training logs.:Epoch 1, Node 0, GPU 0, Iter 300, Top1 Accuracy:0.087469, Loss:5.4242, 131 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 5, Iter 300, Top1 Accuracy:0.083731, Loss:5.4297, 127 samples/s. lr: 0.64553.
INFO:Distribute training logs.:Epoch 1, Node 0, GPU 5, Iter 300, Top1 Accuracy:0.083731, Loss:5.4297, 127 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 7, Iter 300, Top1 Accuracy:0.0856, Loss:5.4234, 130 samples/s. lr: 0.64553.
INFO:Distribute training logs.:Epoch 1, Node 0, GPU 7, Iter 300, Top1 Accuracy:0.0856, Loss:5.4234, 130 samples/s. lr: 0.64553.
Epoch 1, Node 0, GPU 4, Iter 300, Top1 Accuracy:0.08355, Loss:5.4322, 86 samples/s. lr: 0.64553.
INFO:Distribute training logs.:Epoch 1, Node 0, GPU 4, Iter 300, Top1 Accuracy:0.08355, Loss:5.4322, 86 samples/s. lr: 0.64553.
I’m sure I only update pytorch version and do not change any package.
Who print the addition info and how to stop this.