Training limited to 2 CPUs

johnorford · August 21, 2023, 8:50am

After some more investigation.

It seems to me that the data preparation part of the the data loader worker is just being run on two cores.

More details below…

I checked where each process is running.

Both validation & training:

6 workers
6 processes (always running on cores 2 & 5) with 2 children
6 processes evenly utilise the two cores (33% each)
When data is loaded into GPU, then I see other cores activated (more obvious during validation)

Training:

Lower GPU %, sending to GPU less often
I assume the two cores that are activated are loading data prior to sending to GPU

Validation:

Loads data into GPU more often
when data is loaded into GPU (i.e. you can see GPU utilisation also) then remaining cores are activated