Manually divide data to different GPUs in DataParallel

I have several GPUs on the server but they are shared with others, so some might be more available than others. How can I let data go to different GPUs in an imbalanced way?

If you would like to use the GPUs exclusively (one card won’t be shared by two persons), configuring environment variable CUDA_VISIBLE_DEVICES=0,1 to specify using GPU 0 and 1.

If what you mean is to dive a mini-batch unevenly across GPUs, things are more complicated:
For DP, I’m not sure if there is an ‘elegant’ way. A possible hack way is to manually split the mini-batch before calling the forward() function and wrap it in a list like [batch[:10], batch[10:50]]. Inside the forward() function, each GPU fetches corresponding sub-batch according to its gpu id.

Yes exactly! Am thinking about the latter. Do you mean when I send x.to(device) in each batch, I should explicitly do x[:10].to(torch.device('cuda:1')) and x[10:].to(torch.device('cuda:0')) instead? I guess not in forward() ?

I wrote an example

class MyModel(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.param = nn.parameter.Parameter(torch.empty(1))

    def forward(self, x):
        x = torch.tensor(
            x[self.param.device.index],
            device=self.param.device.index
        )
        print(x)


if __name__ == "__main__":
    model = MyModel().cuda()
    model = nn.DataParallel(model)

    x = np.arange(32)
    x = [x[:10], x[10:]]
    model(x)

You should pass non-tensor type to forward(), otherwise the DP would automatically split it. Here I passed a np.array to forward. Splitting is conducted before calling the forward() function.

And note, if the batch is split unevenly, you can’t get loss scalar by loss_list.mean(), instead using loss = (loss_list * n_sub_batch).sum() / n_tot_batch

1 Like