SyncBatchNorm gives lower accuracy than BatchNorm

I ran 3 experiments:
(1) BatchNorm2d, num_gpus = 1, batch_size = 2, learning_rate = 2e-4 -> accuracy = a1
(2) SyncBatchNorm2d, num_gpus = 2, batch_size = 1, learning_rate = 2e-4 -> accuracy = a2
(3) SyncBatchNorm2d, num_gpus = 2, batch_size = 1, learning_rate = 1e-4 -> accuracy = a3

All other variables are the same for the 3 experiments. I expected to see either a1 = a2 or a1 = a3. But both a2 and a3 are lower than a1. Also, when using SyncBatchNorm2d, the network converges slower. Any idea what is happening?