Asynchronous data loading onto GPU with concurent processing

Hi all

I am trying to do a bunch of algebra on data arrays being loaded from disk. So I have 2 steps: data_load and data_process. It turns out that the data_load takes about 50% of data_process (nice problem to have btw!). But this raises the possibility of loading the next batch to GPU while waiting for current batch to finish.

So… would like to figure out a way to run data_load(next_batch) (using multiprocessing?) plus transfer to GPU while data_process(current_batch) is running. I’m a bit confused whether this is even possible, and whether nn.DataParallel modules are meant for for this kind of single GPUwork.

I tried coding something simple using cpu multiprocessing, but I think it defeats the purpose as starting a pool is slow (I would need to do this moultiple times) - and I’m not even sure if this asynch call does what I’m hoping it will do.

pool = mp.Pool(processes = 1)
data_next = pool.apply_async(load_data_next, args = (chunk_ctr, ))

data[current_batch] = torch.from_numpy(data_next).float().to(device)

Any advice on this issue?
Thanks in advance!

1 Like

pytorch dataloader already uses multiprocessing. As gpu is asynchronous it runs in parallel if that’s you are wondering. In fact, when you set num_workers,= N, N batches will be loaded in parallel (if you have enough cpus ofc).

The problem is you want to allocate those batches on gpu inside the multiprocessing. I think that’s not possible (if i’m not wrong). But if you wanna try it’s a matter of allocating tensors into gpu inside the getitem. I’d say it will throw an error but you can try.

For sure I can tell you that you would be only saving the time gpu needs to allocate the batch into the gpu, but nothing else (if it’s not already like that).