Hi all,
I am training my model on the CPU. A very strange behaviour occured (that I could solve) but I thought I would bring it up because I cannot imagine that this is a desired behaviour:
So when I just train my model on the CPU on my PC with 24 cores, all 24 cores being used 100% even though my model is rather small (thats why I dont train it on the GPU). And most of the workload is also Kernel usage. The training time per epoch requires about 2.5 seconds. I have version 1.0.1.post2 on that PC.
So to make it train faster I pushed it to a server with 80 cores. There, however, I got the exact same behaviour: When training all 80 cores were used with 100% work load. The time per epoch took again about 2.5 seconds on average. On that server I use pytorch version 1.1.0.
Reading through some threads
I tried torch.set_num_threads(1) and this not just cut the CPU usage to one core (as expected) but the training also is much faster: About 1 seconds per epoch now.
So, I am not sure if this behaviour is really desired, as it seems like spreading the workload over all CPU cores not just requires all ressources but it also is much slower.