Does the backprop speed depends on how to assign values to the tensor?

Recently I found that how I assign a value to a tensor affects the speed of backward()
my test code is as follows
Simply, I assign ones to (1000,1000) with different methods: assigning by index, stack() after gathering
And then I calculated the sum of all values multiplied by 2

start = time.time()
from_empty = torch.empty(1000, 1000)
for i in range(1000):
    from_empty[i, :] = torch.ones(1000, requires_grad=True)
from_empty = 2*from_empty
from_empty = from_empty.sum()
end = time.time()
print("assign by index", end-start)

start = time.time()
tmp = []
for i in range(1000):
    tmp.append(torch.ones(1000, requires_grad=True))
stacked = torch.stack(tmp, dim=1)
stacked = 2*stacked
stacked = stacked.sum()
end = time.time()
print("stack", end-start)

the result is

assign by index 1.5671675205230713
stack 0.08823513984680176

Why is the first one so slow?



You can try this package to see the generated graph to see why: torchviz (Change the size of the first dimension to 10 :wink: )

As you will seem one created the output in one operation, while the other creates it by doing 1000 inplace operations.

1 Like

Thanks for clear explanation :slight_smile:

1 Like