I am currently working with a model which has a fixed set of heavy operations applied on different versions of the same input. It would be really nice if I could perform these operations in parallel as each one of them takes a big chuck of total execution time. In Theano, with graph optimizations, this would have been automatically executed in parallel. But I am not sure how to parallelize these operations in PyTorch. I looked into Python’s multiprocessing module and PyTorch’s wrapper for it. But I am not sure if it’s usage would maintain the autograd’s graph integrity and how they’d work while performing backprop (which is actually an even more expensive than forward propagation).
An NDA prevents me from sharing the exact model details and code, but here is a representative example of the model that captures the problem.
def forward(x):
# transform the input in 4 different ways - transform is a lightweight operation
x1 = transform(x, 0, 0)
x2 = transform(x, 0, 1)
x3 = transform(x, 1, 0)
x4 = transform(x, 1, 1)
# process each transformed input by the same function
# my_func here is time consuming and the current bottleneck, but is internally optimized
# ideally, all four `my_func` calls below should execute in parallel
a = my_func(x1)
b = my_func(x2)
c = my_func(x3)
d = my_func(x4)
result = a + b + c + d
return result
Is there any way I could speed up this function? Not being able to parallelize this hurts me even more when I stack many such steps, which I am planning to do next. So any suggestions to speed this up in any way would help a lot.
Thanks!