Parallelism with PyTorch. Where to get an overview?

Hello,

so I’m new to pyTorch and very unsure about parallelism in Pytorch. I’ve written my own parallel programs with openMP and MPI. So far only using CPU (or distributed systems with CPU nodes), never GPUs/CUDA. I’m just mentioning that to say: I think I’m familiar with the basics.

I know that pyTorch supports several devices but I’m very unsure about how it exactly does that. I think one big philosophy of pyTorch is:

Use torch.tensor() for data.

Of couse, we also need some logic i.e. a neural network. For which we use e.g. nn.module.

So we have “logic and data”.

Now I know that I could e.g. set up my neural network, use to(device), set up some data and do tensor(device=device) and it works. But I could also done one hundred other things.

What is a good way of doing things? I don’t have any fancy setup yet and i don’t do any fancy work/load balance or whatever. I simply have my neural network, 1 cpu, 1 gpu and I want it to be parallelized.

What I do currently is:

  1. Get my neural network.
  2. Use to(device=device) to send my model to the device
  3. Use torch.utils.data.Dataset(device=device) and in getitem send the torch.tensors() to the device

Is that okay? Can I somehow globally define a device s.t. everything is just send to that device? Do I have to model my data in a specific way to enable parallelism? I can imagine, that I somehow have to add extra care here because how else would pyTorch know which data depends on another.