How to parallel two independent operations?

how to parallel the following two operations in pytorch?

a = w + x
b = y + z

Obviously, the computation of these two operations can essentially be paralleled. Is there any ways to do it?

If you’re working on CUDA, you can put those two operations on different CUDA streams:

Thanks! BTW, can you provide me with some sample code?