This question is about efficiency and speed. Say, I need to run a function several times that returns a tensor and want to combine the results into a big tensor.
Here are examples of two ways to do this:
# assume that funtion() returns (16,16) tensor # first way a = torch.empty(10, 16, 16) for i in range(10): a[i,:,:] = function(i) # second way a =  for i in range(10): a.append(function(i)) a = torch.stack(a, 0)
… and there are more ways, e.g. using
First natural question: which way is generally more efficient or faster for memory and/or computation?
Second: in both presented examples, to my knowledge, the function output is first stored as a separate tensor. Then, during stacking or assigning to a slice, it is copied from one memory location to another, which I believe is not efficient. So, is there any way to make the
function output be directly stored into a preallocated memory location (in this case, the tensor
UPDATE: Ok, it turns out it is not much possible with custom python functions. So, the question becomes narrower, involving only some PyTorch operations which always output a new tensor, not sharing memory with anything else.
Thanks in advance.