Something about dist.all_gather operation

I have a question. When we apply dist.all_gather() operation, suppose the there are 4 gpus, and each gpu will get the value of others, and when we apply the result of all_gather with ground truth to calculate loss, does loss can backward? or the dist.all_gather operation will break the graph like the operation of detach()?

1 Like