Prefetch_factor and num_workers

Andre_Amaral_IST · May 8, 2022, 8:02am

Hey,

In the dataloader what means “prefetch_factor” and how it will affect the batch size and the way the data is loaded?

What is the best approach to choosing the num_workers?

This question came up because when I initialize my lstm hidden and cell states the parameters are (num_layers, batch_size, hidden_size), and if I set a batch size of 10000 (for example) and when i go to initialize the hidden and cell state in the init method it gives me dimensions error saying that the batch_size parameter should be half of the batch size i established (5000 in this example)

Regards!
André

nivek · May 9, 2022, 2:41pm

From the documentation, prefetch_factor defines the number of samples loaded in advance by each worker. 2 means there will be a total of 2 * num_workers batches prefetched across all workers. (default: 2 ).

It should not affect your batch_size.

As for choosing the value for num_workers, it varies depending on your setup. You should see better performance when increasing from a small value to a slightly larger one. At the same time, it will increase your memory usage due to overhead and you will need sufficient memory for that. At some point, a larger num_workers value will not boost your performance anymore and you should use a value around that point.

Andre_Amaral_IST · May 9, 2022, 2:43pm

Thanks for your answer.
I am using 8 workers, I saw on the web people saying that a good way to choose the num_workers is
4 * number of gpus