How to write the right code for distributed training?

Hi,

This might be related to this thread and you can check this comment for debugging steps.