I find this in the documentation:
broadcast_buffers: flag that enables syncing (broadcasting) buffers of the module at beginning of the forward function. (default: True)
But what exactly does that mean? If I set it to False, from code it looks like the gradients are still reduced together, which should result in the same buffers on all the replicas on all the nodes? Please let me know if I am wrong.