In the dataloader what means “prefetch_factor” and how it will affect the batch size and the way the data is loaded?
What is the best approach to choosing the num_workers?
This question came up because when I initialize my lstm hidden and cell states the parameters are (num_layers, batch_size, hidden_size), and if I set a batch size of 10000 (for example) and when i go to initialize the hidden and cell state in the init method it gives me dimensions error saying that the batch_size parameter should be half of the batch size i established (5000 in this example)
From the documentation, prefetch_factor defines the number of samples loaded in advance by each worker. 2 means there will be a total of 2 * num_workers batches prefetched across all workers. (default: 2 ).
It should not affect your batch_size.
As for choosing the value for num_workers, it varies depending on your setup. You should see better performance when increasing from a small value to a slightly larger one. At the same time, it will increase your memory usage due to overhead and you will need sufficient memory for that. At some point, a larger num_workers value will not boost your performance anymore and you should use a value around that point.