Torchrun launches each process on the same CPUs/GPUs

Hello, I am trying to run the multinode ecample using SLURM, on a cluster where every node has 8 GPUs, and 56 CPUs - examples/distributed/ddp-tutorial-series/multinode.py at main · pytorch/examples · GitHub

This is how I launch it -

srun -n2 -N2 -c7 --tasks-per-node=1 --threads-per-core=1 bash -c "torchrun  --nproc_per_node=8 --nnodes=2  --master_addr=$master_ip_addr --master_port=3442 --node_rank \$SLURM_PROCID ./multinode.py 2000 10"

I notice that torchrun is launching every process using the same set of CPUs/GPUs, as determined by the numa library (GitHub - numactl/numactl: NUMA support for Linux)

CPU affinity for PID 3452774: 0 1 2 3 4 5 6
GPU affinity for PID 3452774: 4
CPU affinity for PID 3452775: 0 1 2 3 4 5 6
GPU affinity for PID 3452775: 4
CPU affinity for PID 3452771: 0 1 2 3 4 5 6
GPU affinity for PID 3452771: 4
CPU affinity for PID 1004587: 0 1 2 3 4 5 6
GPU affinity for PID 1004587: 4
CPU affinity for PID 3452772: 0 1 2 3 4 5 6
GPU affinity for PID 3452772: 4
CPU affinity for PID 1004583: 0 1 2 3 4 5 6
GPU affinity for PID 1004583: 4
CPU affinity for PID 1004580: 0 1 2 3 4 5 6
GPU affinity for PID 1004580: 4
CPU affinity for PID 3452773: 0 1 2 3 4 5 6
GPU affinity for PID 3452773: 4
CPU affinity for PID 1004584: 0 1 2 3 4 5 6
GPU affinity for PID 1004584: 4
CPU affinity for PID 1004582: 0 1 2 3 4 5 6
GPU affinity for PID 1004582: 4
CPU affinity for PID 3452770: 0 1 2 3 4 5 6
GPU affinity for PID 3452770: 4
CPU affinity for PID 1004586: 0 1 2 3 4 5 6
GPU affinity for PID 1004586: 4
CPU affinity for PID 3452776: 0 1 2 3 4 5 6
GPU affinity for PID 3452776: 4
CPU affinity for PID 1004585: 0 1 2 3 4 5 6
GPU affinity for PID 1004585: 4
CPU affinity for PID 1004581: 0 1 2 3 4 5 6
GPU affinity for PID 1004581: 4
CPU affinity for PID 3452777: 0 1 2 3 4 5 6
GPU affinity for PID 3452777: 4

Am I launching torchrun in a sensible manner using SLURM? Is launching all processes on the same CPUs expected behavior by torchrun? Is there a workaround where I can enforce torchrun to not use CPUs already assigned to a process?