Pining multiple cpu cores to each gpu processes

I am using torch.multiprocessing to create processes for each gpu. The processes are mixture of cpu and gpu. Data X is pass to all processes. Other arguments passe to each process to tell to act in which portion of data X.

def foo(X, args, gpu):
  with torch.no_grad():
    t1 = time.time()
    classes_to_process = list(range(1,int(number_of_classes)+1))
    list_of_tuples = [(0,0)] * number_of_classes
    Q = mp.Queue()
    done_event = [ mp.Event()  for k in range(number_of_classes)]

    NG = min(  len(gpu) , number_of_classes  )
    assert NG > 0
    processes = []
  
    for k in range(number_of_classes):
      r = k % NG
      p = mp.Process(target==my_process, args_my_process=([classes_to_process[k]], X, args, gpu[r], Q, done_event[k]))
      p.start()
      processes.append(p)
      if r == ( NG  - 1 ):
        for n in range(NG):
          L , D = Q.get()
          list_of_tuples[L-1] = (L, D)
          done_event[L-1].set()
        for p in processes:
          p.join()
        processes.clear()
    if r !=  ( NG  - 1 ):
      for n in range(r+1):
        L , D = Q.get()
        list_of_tuples[L-1] = (L, D)
        done_event[L-1].set()
      for p in processes:
        p.join()
    Y = OrderedDict(list_of_tuples)
    t2 = time.time()
    print("foo time = ", t2 - t1)
    del Q, done_event, L, D, p, list_of_tuples, processes
    return Y

It work without error. However, It assign a gpu and a cpu core to each processes. The single cpu core for each process slow done the whole program.

How can I assign multiple cpu cores and a single gpu to each process?