Continue training from checkpoint returns high loss values, while reasonable with .eval()

In addition, you can try setting torch.backends.cudnn.enabled = False when training using SyncBatchNorm and DDP, as discussed in Training performance degrades with DistributedDataParallel.