We know that torch.tensor([1, 2, 3, 4, 5]) will create a new tensor. If I have an existing buffer, can I make this function call directly move data there?
e.g. wha I want to achieve:
buffer = torch.empty((300,), dtype=torch.int64)
data = torch.tensor([1, 2, 3, 4, 5])
buffer[5:10] = data
However, buffer[5:10] = data involves an additional copy. I want to optimize that.
Another solution is to write a for-loop:
for i in range(5, 10):
buffer[i] = data[i - 5]
However, I’m afraid that the Python for-loop overhead will be more than the additional copy.
I assume that your concern is the copy involved in:
data_as_list = [1, 2, 3, 4, 5] # a python list
data = torch.tensor (data_as_list) # copy the list data into a (new) tensor
That is, you would like to achieve something like buffer[5:10] = data_as_list
without any additional copies. I don’t believe that pytorch gives you any way to
do this. This is because the int64s themselves are not stored in the python list
contiguously, so pytorch can’t just copy the data_as_list memory into the buffer
memory`.
If data is already packaged as a pytorch tensor, this is probably the most efficient
way to go.
(If data is a python list rather than a python tensor, you probably want to create a
new python tensor and then assign it into buffer, because the for-loop overhead
will almost certainly be worse than creating the additional tensor.)
Some frameworks / apis do offer the ability to create a new object into a specific
memory location, but I’m not aware of anything like this in pytorch. So two separate
copy operations are needed.
import array
import torch
# Create a Python array.array object of type 'float'
data = array.array('f', [1.0, 2.0, 3.0, 4.0])
# Convert the array.array object to a PyTorch tensor that shares the same data
tensor = torch.frombuffer(data, dtype=torch.float32)
# Display the tensor
print(tensor) # tensor([1., 2., 3., 4.])
# Modify the original array and see the changes in the tensor
data[1] = 5.0
print(tensor) # tensor([1., 5., 3., 4.])