Dear friends, I am using pytorch for linear algebra task to accelerate some calculations with GPUs. I have some function which do some calculations with given two tensors for example A and B. I have created two instances of this function with two pairs of tensors allocated on two different GPUs
some_fun(Tensor_A1_GPU0,Tensor_B1_GPU0,GPU_0) # First instance
some_fun(Tensor_A2_GPU1,Tensor_B2_GPU1,GPU_1) # Second instance
As I understand pytorch by default executes commands asynchronous, but for my case for some reason it waits first instance to be done only than it runs second one, even though they are completely independent. What could be the cause for this behavior?