Training limited to 2 CPUs

After some more investigation.

It seems to me that the data preparation part of the the data loader worker is just being run on two cores.

More details below…

I checked where each process is running.

Both validation & training:

  • 6 workers
  • 6 processes (always running on cores 2 & 5) with 2 children
  • 6 processes evenly utilise the two cores (33% each)
  • When data is loaded into GPU, then I see other cores activated (more obvious during validation)

Training:

  • Lower GPU %, sending to GPU less often
  • I assume the two cores that are activated are loading data prior to sending to GPU

Validation:

  • Loads data into GPU more often
  • when data is loaded into GPU (i.e. you can see GPU utilisation also) then remaining cores are activated