I training my model on one GPU(v100), the speed as below:
2019-09-17 01:46:48,876 - INFO - [ 1022/10000] lr: 0.000100 Time 1.773 ( 1.744) Data 0.001 ( 0.001) Loss 6341.771 (6945.944)
2019-09-17 01:46:50,593 - INFO - [ 1023/10000] lr: 0.000100 Time 1.607 ( 1.722) Data 0.001 ( 0.001) Loss 7225.229 (6958.357)
2019-09-17 01:46:52,323 - INFO - [ 1024/10000] lr: 0.000100 Time 1.717 ( 1.732) Data 0.001 ( 0.001) Loss 7218.038 (6929.233)
The format of time info, such as Time 1.717 ( 1.732)
, that 1.717
is current batch, 1.732
is the one hundred batch recently cost time.
when I use 8GPU on one node, use torch.nn.parallel.DistributedDataParallel
, torch.nn.SyncBatchNorm.convert_sync_batchnorm
and mp.spawn
, the speed as below:
019-09-16 06:06:40,619 - INFO - [ 9/5000] lr: 0.000036 Time 2.822 ( 4.896) Data 0.001 ( 1.428) Loss 307113.969 (331260.794)
2019-09-16 06:06:43,485 - INFO - [ 10/5000] lr: 0.000037 Time 3.419 ( 4.749) Data 0.001 ( 0.001) Loss 303037.688 (325792.062)
2019-09-16 06:06:46,120 - INFO - [ 11/5000] lr: 0.000037 Time 2.866 ( 2.943) Data 0.001 ( 0.001) Loss 296579.000 (320417.425)
2019-09-16 06:06:48,925 - INFO - [ 12/5000] lr: 0.000037 Time 2.634 ( 2.879) Data 0.001 ( 0.001) Loss 292080.625 (315081.881)
2019-09-16 06:06:51,671 - INFO - [ 13/5000] lr: 0.000038 Time 2.806 ( 2.847) Data 0.001 ( 0.001) Loss 286678.000 (309843.294)
The speedup ratio of 8GPU is about 0.6.