Contribution: Dataloader with workers on remote nodes

Hi.

In my lab we have a situation where GPU servers have too few cpus to load and preprocess data. As a result it is difficult to keep the GPUs busy and our experiments take longer than we want.
To work around this I have developed a Dataloader which dispatches item requests to other servers via remote procedure calls over the network. That way, one can offload the dataloader workers to one or several cpu nodes.

It’s over here: GitHub - CEA-LIST/RPCDataloader: A variant of the PyTorch Dataloader using remote workers.

The interface is similar to the regular Dataloader except for a few modifications:

  • Workers must be started manually on remote nodes (you can even put some of them locally if you have spare CPU cores).
  • The dataloader takes the dataset constructor and arguments instead of the dataset instance itself (the dataset object gets instantiated on the workers).
  • The dataloader needs the list of workers IPs and ports

Note that you might need a fast network connection.

1 Like