Why would SyncBatchNorm give different results from BatchNorm?

Regarding worse results, could you try setting:

torch.backends.cudnn.enabled = False

Per a few resources such as Training performance degrades with DistributedDataParallel - #32 by dabs, this appears to help accuracy/convergence related issues.

Furthermore, the CuDNN backend is known to be nondeterministic, see for example Batchnorm gives different results depending on whether cudnn is enabled · Issue #8283 · pytorch/pytorch · GitHub. Could you try to set torch.backends.cudnn.deterministic = True to help understand if that results in equivalent outputs?