Hi, the remote server has 32 cpus. and now I am running the code on the server with 4 gpus. But I want to limit the usage of cpus. For examples, can I use only 16 cpus for my code? we are doing a benchmark and are interested in that. how to realize that?
For inter-op parallelism you should be able to use
torch.set_num_interop_threads(), for intra-op
MKL_NUM_THREADS should work.
For the intra-op parallelism settings,
torch.set_num_threadsalways take precedence over environment variables,
MKL_NUM_THREADSvariable takes precedence over