Lazy Evaluation Behavior

Hello !

I have a quick question on lazy evaluation of long running operations and if they block or not (separate answer for *.device(‘cpu’) and .device(‘cuda’), if necessary"

def MyFunc(inputTensor):
    x = SomeLongRunningFunction(inputTensor);
    y = AnotherLongRunningFunction(inputTensor);
    return x,y;

Does the computation of x block the computation of y, even though they are independent of each other? If yes, is there a simple way to make the computations occur simultaneously?

Similar questions for loops. Is the loop unrolled so the independent indexed computations can occur simultaneously?

def MyLoopFunc(inputTensor,N,M,device):
    x = torch.empty((N,M),device=device);
    for idx in range(N):
        x[idx,:] = LongRunningOperation(inputTensor,idx);
    return x;

Looks like they are lazily evaluated: https://pytorch.org/docs/stable/notes/cuda.html#asynchronous-execution