What's the right way to update a field of a DistributedDataParallel object during training?

epignatelli · August 1, 2020, 6:11am

What is the correct way to update a class variable once the model has been wrapped around a DistributedDataParallel?

In the case below, if we take a snapshot of self.k for each model in each gpu, at the same time, we can get different results.

Any idea why that happens and how to solve that?

I guess loss would be different across all the models?

class Model(torch.nn.Module):
    def __init__(self):
        self.fc = torch.nn.Linear(128, 128)
        self.register_buffer("k", 0)
        self.callback = lambda x: x + 1

    def forward(self, x):
        print(self.k)
        return self.fc(x)

    def training_step(self, x, y):
        y_hat = self(x)
        loss = torch.nn.functional.binary_cross_entropy(y_hat, y)
        loss.backward()
        if loss > 1.:
            self.callback(loss, self.k)

mrshenli · August 3, 2020, 2:43pm

Hey @epignatelli

When did you take the snapshot of self.k? DDP will broadcast all buffers from rank 0 process to other processes right before calling Mode.forward. See the code below. So given the above code, the buffer should be consistent across all processes before the self.callback is launched.

github.com

pytorch/pytorch/blob/79cfd85987c128b672850b64d6680719e4eda552/torch/nn/parallel/distributed.py#L695-L713


# module buffer sync
if self.broadcast_buffers and len(self.modules_buffers[0]) > 0:
    # Synchronize buffers across processes.
    # The process with rank 0 is considered the authoritative copy.
    self._distributed_broadcast_coalesced(
        self.modules_buffers[0],
        self.broadcast_bucket_size)
    # only do intra-node buffer sync for replicated single-device
    # CUDA modules
    if self.device_ids and len(self.device_ids) > 1:
        # intra-node buffer sync
        result = comm.broadcast_coalesced(
            self.modules_buffers[0],
            self.device_ids,
            self.broadcast_bucket_size)
        for tensors, module_buffers in zip(result[1:],
                                           self.modules_buffers[1:]):
            for tensor, buffer in zip(tensors, module_buffers):
                buffer.set_(tensor)

mrshenli · August 3, 2020, 2:44pm

BTW, could you please add a “distributed” tag to distributed-training related questions? So that people working on it can get back to you promptly. Thanks!