How does torch.set_num_threads differ from numactl?

In my cluster machine, I have 40 CPUs and 4 GPUs. If I want to train 4 different models in parallel, what’s the difference between using set_num_threads and numactl?