What's the difference between the two operations

tyj1997 · December 9, 2018, 11:49am

I have a tensor x and x.shape=(batch_size,10)
I want to add one to all of the elements, and take two different operations
(1) x=x+1
(2) for i in range(0,batch_size):
x[i]=x[i]+1
I got the same tensors with the two operations,but when I call loss.backward(), (2) takes much more time than (1) in back propagation. What’s the difference betweent them???

InnovArul · December 9, 2018, 12:24pm

Mostly, It might be because of the length of computation graph due to for-loop.
In (1), it is a single operation that is done using Multi-cores in one go (in GPUs).
(2) has to loop back through the computation graph (data transfer to & from GPUs) and calculate gradient one by one.

tyj1997 · December 9, 2018, 12:57pm

Thanks for your answer，I try to take
(3). for i in range(0,batch_size):
x=x+1
and I found (2) still took much more time than (3). Is there any difference between the computation graph they create? It seems that (2) is an in-place operation, does the result is related to this?