How to copy a Python list of int into an existing pytorch tensor storage?

youkaichao1 · May 13, 2024, 5:56am

We know that torch.tensor([1, 2, 3, 4, 5]) will create a new tensor. If I have an existing buffer, can I make this function call directly move data there?

e.g. wha I want to achieve:

buffer = torch.empty((300,), dtype=torch.int64)
data = torch.tensor([1, 2, 3, 4, 5])
buffer[5:10] = data

However, buffer[5:10] = data involves an additional copy. I want to optimize that.

Another solution is to write a for-loop:

for i in range(5, 10):
    buffer[i] = data[i - 5]

However, I’m afraid that the Python for-loop overhead will be more than the additional copy.

KFrank · May 13, 2024, 10:04pm

Hi Youkai!

I assume that your concern is the copy involved in:

data_as_list = [1, 2, 3, 4, 5]       # a python list
data = torch.tensor (data_as_list)   # copy the list data into a (new) tensor

That is, you would like to achieve something like buffer[5:10] = data_as_list
without any additional copies. I don’t believe that pytorch gives you any way to
do this. This is because the int64s themselves are not stored in the python list
contiguously, so pytorch can’t just copy the data_as_list memory into the buffer
memory`.

If data is already packaged as a pytorch tensor, this is probably the most efficient
way to go.

(If data is a python list rather than a python tensor, you probably want to create a
new python tensor and then assign it into buffer, because the for-loop overhead
will almost certainly be worse than creating the additional tensor.)

Best.

K. Frank

youkaichao1 · May 14, 2024, 4:19am

Thanks for the reply. I know at least one copy is required, however, the following code

buffer = torch.empty((300,), dtype=torch.int64)
data = torch.tensor([1, 2, 3, 4, 5])
buffer[5:10] = data

should have 2 copy, one for creating the tensor from python list, the other for assigning and moving to the buffer.

I’m trying to ask, if I can save the second copy, in buffer[5:10] = data .

KFrank · May 14, 2024, 6:55pm

Hi Youkai!

Yes.

And yes.

Not that I am aware of.

You would want something (that doesn’t exist) like:

buffer = torch.empty((300,), dtype=torch.int64)
data_as_list = [1, 2, 3, 4, 5]
torch.create_tensor_into_specified_memory (data_as_list, specified_memory = buffer[5:10])

Some frameworks / apis do offer the ability to create a new object into a specific
memory location, but I’m not aware of anything like this in pytorch. So two separate
copy operations are needed.

Best.

K. Frank

youkaichao1 · July 28, 2024, 9:59pm

I found it, the answer is to use array:

import array
import torch

# Create a Python array.array object of type 'float'
data = array.array('f', [1.0, 2.0, 3.0, 4.0])

# Convert the array.array object to a PyTorch tensor that shares the same data
tensor = torch.frombuffer(data, dtype=torch.float32)

# Display the tensor
print(tensor) # tensor([1., 2., 3., 4.])

# Modify the original array and see the changes in the tensor
data[1] = 5.0
print(tensor) # tensor([1., 5., 3., 4.])