I am running my program for CPU-only inference in the linux cluster (LSF).
The problem is that the more jobs are distributed, the more time is taken for forward() function.
When the only job is running it takes only 0.01s for forward(), but if I submit several jobs, the time increases proportionally.
I set torch::set_num_threads() but it does not work.
If all jobs are distributed in the different machines it seems the time does not increase. But the nodes are limited, I cannot avoid time increment.
Is there anyone who knows how to prevent time increasing in this case or suffering from the same problem?
Let me answer myself
solves the problem.