Number of Iterations in Multi-GPU Setting?

I often see people referring to a certain number of iterations for a certain batch size on multiple GPUs, and was wondering how to interpret this.

For example, if it stated (as in the original ResNet paper) that the training lasted for 64k iterations and the batch size is 128 (64 per GPU) and 2 GPUs were used, do the 64k iterations refer to the number of iterations/GPU with a batch size of 64, or is it the number of iterations on both GPUs with a batch size of 64?