Hi. I’m working on an architecture where my input is a list of tensors (say 5 or 10), and I need to feed them all through the same subnet, take their respective outputs and aggregate them. I implemented this in a sequential manner like:
outs = []
for inp in inputs:
out = module(inp)
outs.append(out)
result = some_aggregation_logic(outs)
I’m wondering what is the appropriate way to parallelize that for-loop (if there even is one). It’s probably worth noting that out = module(inp)
processes the i-th object for every element in the batch.