Execute following command simultaneously

when

x1 = torch.rand(20, 1024, 1024, device=torch.device(“cuda:0”), dtype=dtype)
x2 = torch.rand(20, 1024, 1024, device=torch.device(“cuda:1”), dtype=dtype)
y1 = torch.rand(20, 1024, 1, device=torch.device(“cuda:0”), dtype=dtype)
y2 = torch.rand(20, 1024, 1, device=torch.device(“cuda:1”), dtype=dtype)

I wonder that how can I execute following two commands simultaneously.
z1 = torch.gesv(y1, x1)
z2 = torch.gesv(y2, x2)

Just like that :slight_smile:
CUDA api is asynchronous so if the ops are on two different gpus, they will run at the same time.

I build a model which contains not only CNN for extracting features, but also a torch.gesv function to solve Ax=B. When I run train this model in single GPU(batch size is 20), the time consuming of torch.gesv is 100ms, however, when train this model in 4 GPUs(batch size is 80, each GPU solve 20 sampls), the time consuming of torch.gesv on each GPU is 300ms. This extremely slow down my training process with multi-gpus. Why? Beg your reply, thank you!

This is a duplicate of Problem of torch.gesv on multi-GPUs so I’ll answer there.

Thank you very much.