Parallel construction of list of tensors

I have a list of tensors loaded on GPU. I’d like to apply a function to each element of the list in parallel and construct another list, e.g as could be done using Parallel, delayed modules in python. How can I achieve this parallelism in pytorch?