Hi,
Thanks for reading this post. Recently I got access to multiple gpus so I started to use distributed. I got a few questions regarding to all_gather() function:
-
If I do not use it, does it mean that my models will be updated separately on different gpus?
-
If I need to use it, can I just simply use it on the computed loss? In this way, I can get a conclusion version of loss which can help me update all the models on different gpus in the same way?
-
What about the BN layer? If I only use it on the loss, will the BN layer or other normalization layer updated correctly?
Thanks for your time!