Hi, the remote server has 32 cpus. and now I am running the code on the server with 4 gpus. But I want to limit the usage of cpus. For examples, can I use only 16 cpus for my code? we are doing a benchmark and are interested in that. how to realize that?
For inter-op parallelism you should be able to use torch.set_num_interop_threads()
, for intra-op torch.set_num_threads()
, OMP_NUM_THREADS
, and MKL_NUM_THREADS
should work.
For the intra-op parallelism settings,
at::set_num_threads
,torch.set_num_threads
always take precedence over environment variables,MKL_NUM_THREADS
variable takes precedence overOMP_NUM_THREADS
.