Manually divide data to different GPUs in DataParallel

jasperhyp · December 9, 2022, 7:47am

I have several GPUs on the server but they are shared with others, so some might be more available than others. How can I let data go to different GPUs in an imbalanced way?

huahuanZ · December 9, 2022, 9:43am

If you would like to use the GPUs exclusively (one card won’t be shared by two persons), configuring environment variable CUDA_VISIBLE_DEVICES=0,1 to specify using GPU 0 and 1.

If what you mean is to dive a mini-batch unevenly across GPUs, things are more complicated:
For DP, I’m not sure if there is an ‘elegant’ way. A possible hack way is to manually split the mini-batch before calling the forward() function and wrap it in a list like [batch[:10], batch[10:50]]. Inside the forward() function, each GPU fetches corresponding sub-batch according to its gpu id.

jasperhyp · December 10, 2022, 1:03am

Yes exactly! Am thinking about the latter. Do you mean when I send x.to(device) in each batch, I should explicitly do x[:10].to(torch.device('cuda:1')) and x[10:].to(torch.device('cuda:0')) instead? I guess not in forward() ?

huahuanZ · December 10, 2022, 5:03am

I wrote an example

class MyModel(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.param = nn.parameter.Parameter(torch.empty(1))

    def forward(self, x):
        x = torch.tensor(
            x[self.param.device.index],
            device=self.param.device.index
        )
        print(x)


if __name__ == "__main__":
    model = MyModel().cuda()
    model = nn.DataParallel(model)

    x = np.arange(32)
    x = [x[:10], x[10:]]
    model(x)

You should pass non-tensor type to forward(), otherwise the DP would automatically split it. Here I passed a np.array to forward. Splitting is conducted before calling the forward() function.

And note, if the batch is split unevenly, you can’t get loss scalar by loss_list.mean(), instead using loss = (loss_list * n_sub_batch).sum() / n_tot_batch