Number of worker = 0 vs number of worker = 4

can someone explain to me why when i run the same code with number of worker = 0 it is faster than the version that have number of worker = 4?

This can happen in some cases, when loading the data using a single worker (main-worker) is faster than loading data using 4 extra workers and transferring the data to the main worker. It depends on the data, as well as the system’s architecture how fast data can be transferred to the main worker.

is there any way to understand which way is better other than try all ways?

Also, do you know how to figure out the number of worker that are available on the system (for ubuntu)?