Run-time increase in linux cluster

hanabi · September 15, 2021, 6:24am

Hi,
I am running my program for CPU-only inference in the linux cluster (LSF).
The problem is that the more jobs are distributed, the more time is taken for forward() function.
When the only job is running it takes only 0.01s for forward(), but if I submit several jobs, the time increases proportionally.
I set torch::set_num_threads() but it does not work.
If all jobs are distributed in the different machines it seems the time does not increase. But the nodes are limited, I cannot avoid time increment.
Is there anyone who knows how to prevent time increasing in this case or suffering from the same problem?

hanabi · September 15, 2021, 6:58am

Let me answer myself

at::set_num_threads(int num)

solves the problem.