Just a quick thought from a cursory skim of this problem. I’s wondering whether a purely multiprocessing
flow may help (on cpus) here:
- chunk the lists (A, B)
- use a
multiprocessing.Pool.map(func, zip(A, B))
something like (quick pseudocode)
def chunks(a:List, b:List, n:int) -> Generator;
nproc = os.cpu_count() - 1
with Pool(proceses=nproc) as pool:
func = functools.partial(process_matrices, *params)
pool.map(func, zip(a_chunk, b_chunk))