I have a tensor x and x.shape=(batch_size,10)

I want to add one to all of the elements, and take two different operations

(1) x=x+1

(2) for i in range(0,batch_size):

x[i]=x[i]+1

I got the same tensors with the two operations,but when I call loss.backward(), (2) takes much more time than (1) in back propagation. What’s the difference betweent them???

Mostly, It might be because of the length of computation graph due to for-loop.

In (1), it is a single operation that is done using Multi-cores in one go (in GPUs).

(2) has to loop back through the computation graph (data transfer to & from GPUs) and calculate gradient one by one.

Thanks for your answer，I try to take

(3). for i in range(0,batch_size):

x=x+1

and I found (2) still took much more time than (3). Is there any difference between the computation graph they create? It seems that (2) is an in-place operation, does the result is related to this?