Using NVIDIA apex for training, i cannot get same accuracy after training

Hi, I have a questions about NVIDIA apex
I know NVIDIA apex package creates each process per gpu, like this

so, each process are referred as local_rank variable in my code
I want to save best accuracy from each process and i coding like below

When i Using 2 gpus

for epochs in range(0, args.epoch):
   train()
   test()
    ...
   save_best()
def save_best():
   # 1'th gpu
   if args.local_rank == 0:
      is_best = test_acc > best_acc
      best_acc = max(test_acc, best_acc)
      if is_best:
         torch.save(...)
   # 2'th gpu
   if args.local_rank == 1:
      is_best = test_acc > best_acc
      best_acc = max(test_acc, best_acc)
      if is_best:
         torch.save(...)

After 1 epoch I can verify each accuracy
0’th gpu’s accuracy is 19.906, It is saved 0’th weight file
1’th gpu’s accuracy is 19.269, It is saved 1’th weight file

But, When i loading weight file and adapt to network, test accuracy is not equal to each result
I got 19.572(0’th file), 19.561(1’th file)

Surprisingly, When i using 1 gpu for training, the situation that i mentioned above is not happened(test accuracy while training is equal to accuracy which is loading from weight file)

I can’t understand why this situation is happened.
Any body can help?

We recommend to use the native mixed-precision training utility via torch.cuda.amp instead of apex, as it should cover more tested use cases.
More information can be found here.