This question is about efficiency and speed. Say, I need to run a function several times that returns a tensor and want to combine the results into a big tensor.
Here are examples of two ways to do this:
# assume that funtion() returns (16,16) tensor
# first way
a = torch.empty(10, 16, 16)
for i in range(10):
a[i,:,:] = function(i)
# second way
a = []
for i in range(10):
a.append(function(i))
a = torch.stack(a, 0)
… and there are more ways, e.g. using torch.cat
First natural question: which way is generally more efficient or faster for memory and/or computation?
Second: in both presented examples, to my knowledge, the function output is first stored as a separate tensor. Then, during stacking or assigning to a slice, it is copied from one memory location to another, which I believe is not efficient. So, is there any way to make the function
output be directly stored into a preallocated memory location (in this case, the tensor a
's slice)?
UPDATE: Ok, it turns out it is not much possible with custom python functions. So, the question becomes narrower, involving only some PyTorch operations which always output a new tensor, not sharing memory with anything else.
Thanks in advance.