We have some data pre-processing using PyTorch & Numpy before feeding into training of the network. And these are now done serially, which slows down our training process. Is there any approach in PyTorch to speed it up such as using multiprocessing? Any kinds of suggestsions are also welcome. Thanks.
The general idea is to create a dataset class and implement the pre-processing within the get functionality. If you create a data loader instance with that dataset and set num_workers, worker processes will handle fetching and preprocessing data.