Data Parallelization / nn.DataParallel / multi-processing for general tensor operations on dataset

Just like pytorch provides option like nn.DataParallel() to efficiently make use of multiple-GPUs, is there any such option for general purpose tensor operations that might not require mutli-GPUs but rather multi-processing ?

Like if i have a function like:

> def generate_dataset(dataloader, val=False):
>     f = open('new_dataset.txt', 'w')
>         for x,y in tqdm(dataloader, total=len(dataloader)):
>             pred = []
>             for x_d in x:
>                  pred.append(general_tensor_operations(x_d))        #THIS IS'NT NECESSARILY A nn.Module()
>             data = format_to_string_data([pred, y.item()])
>             f.write(data)
>     f.close()

Note that,

  1. I am making use of a torch dataloader object
  2. the tensor operation on x is NOT performed by a nn.Module() model
  3. the tensor operation doesn’t work on batch data, rather on each slice of the mini-batch
  4. and i write the obtained (pred,y) to a new txt file

Now the processing bottle neck occurs at the general_tensor_operations(). So if I am to parallelize such a function (generate_dataset()), how do i do it ? Note that, if i am using multiple processes, then they should use non-overlapping subsets of the original dataloader (otherwise there will be duplicates in the new_dataset.txt). A code snippet shall help a ton.

Thank you in advance!

If this is on the same machine, will torch.multiprocessing.queues.SimpleQueue work for you?

If you need across-machine communication, you can use torch.distributed.rpc.