After some more investigation.
It seems to me that the data preparation part of the the data loader worker is just being run on two cores.
More details below…
I checked where each process is running.
Both validation & training:
- 6 workers
- 6 processes (always running on cores 2 & 5) with 2 children
- 6 processes evenly utilise the two cores (33% each)
- When data is loaded into GPU, then I see other cores activated (more obvious during validation)
Training:
- Lower GPU %, sending to GPU less often
- I assume the two cores that are activated are loading data prior to sending to GPU
Validation:
- Loads data into GPU more often
- when data is loaded into GPU (i.e. you can see GPU utilisation also) then remaining cores are activated

