Fast way to use `map` in PyTorch?

Just a quick thought from a cursory skim of this problem. I’s wondering whether a purely multiprocessing flow may help (on cpus) here:

  1. chunk the lists (A, B)
  2. use a multiprocessing.Pool.map(func, zip(A, B))

something like (quick pseudocode)

def chunks(a:List, b:List, n:int) -> Generator;

nproc = os.cpu_count() - 1
with Pool(proceses=nproc) as pool:
  func = functools.partial(process_matrices, *params)
  pool.map(func, zip(a_chunk, b_chunk))