In policy gradient, we need to run several batches to collect the trajectories and update the network. Can this process run in parallel in pytorch? I tried it before but it seemed that Pytorch cannot transmit gradient between threads. Any advice? Many thanks!