Hi, I have a custom dataloader. I have explicitly used python’s multiprocessing
to parallelize data preprocessing in my custom dataloader. I am using 8 workers(num_threads
) in multiprocessing
in my dataLoader.
I wanted to know, how will that affect my torch.utils.data.DataLoader
call?
Will the num_workers
argument be set to 8? Or can I leave it at 0?
My custom Loader looks like this
def processParams(params):
<some operations on params>
return params
def processParamsParallel(params, pool):
results = pool.map(processParams, params)
return results
class DataLoader(object):
def __init__(self, params, maxId):
self.params = params
self.id = 0
self.maxId = maxId
self.pool = Pool(processes=8)
def __iter__(self):
while self.id<=self.maxId:
if self.id==self.maxId:
self.id = 0
results = processParamsParallel(self.params, self.pool)
self.id+=1
yield results
It’s a very rough example of what I am trying to do.
Now in the torch call
dl = DataLoader(params, 50)
dl_torch = torch.utils.data.Dataloader(dl, num_workers = <what_here?>, prefetch_factor = <what_here?>)
On a similar note, how will prefetch_factor
be affected given that the num_workers
are not set by torch call but by the custom dataLoader itself?
From torch.utils.data
prefetch_factor - Number of samples loaded in advance by each worker. 2
means there will be a total of 2 * num_workers samples prefetched across all workers. (default: 2
)
Thank you in Advance!