Applying same module to a list of inputs, in parallel

Hi. I’m working on an architecture where my input is a list of tensors (say 5 or 10), and I need to feed them all through the same subnet, take their respective outputs and aggregate them. I implemented this in a sequential manner like:

outs = []
for inp in inputs:
    out = module(inp)

result = some_aggregation_logic(outs)

I’m wondering what is the appropriate way to parallelize that for-loop (if there even is one). It’s probably worth noting that out = module(inp) processes the i-th object for every element in the batch.

1 Like