Accuracy on test set is higher training with DP than training DDP, even using 1 GPU

Accuracy on test set is higher training with DP than training DDP, even using 1 GPU. However, training losses are almost the same.