Convergence speed on 8 V100 vs 1 V100

Hello everybody,
I have a dataset composed of 1 million training data. I am using Distrubted Data parallel with 8 gpus with batch size of 1200 (150 per gpu). My training steps per GPU are 814 step nearly and the steps on 1 GPU is 6500.
My question is , when the training reach 814 steps in multi gpu scenario and 6500 steps in 1 gpu scenario they should have the same loss meaning the multigpu will nearly converge 7 times faster ? or how do i understand more the difference. (I cannot experiment with 1 GPU because it takes 1 hour and 22 minutes per epoch).