Thanks a lot!
One thing I don’t quite understand here is:
According to the document, it doesn’t say that this torch.gesv(Bs[i], As[i]) is a non-blocking execution. So my understanding is that the next linear system will be dispatched to the next GPU only after the previous one is finished. If this is the case, then the tasks are still executed sequentially, which is less optimal. Is there some place to read about the blocking or non-blocking mechanism in Pytorch?
Btw, synchronization is not necessary for the given example, I was trying to figure out an optimal way to do non-blocking execution and synchronization with PyTorch.