Create Tensor from data pointer

To get the address of the first element of a tensor one can call the method:


I’m wondering if it would be possible to create a Tensor by knowing the device, the result obtained from Tensor.data_ptr and the Tensor shape.


I had the same question.

I looked at some of the functions in Type.h (e.g. Type.h:114) and tried to find their usage in the test cases (e.g. atest.cpp:61) to figure out how to use them.

In particular this function looked useful (Type.h:114):

Tensor tensorFromBlob(void * data,
                      IntList sizes,
                      const std::function<void(void*)> & deleter=noop_deleter) const;

This is a member function of the Type class.

To make a Tensor with it, first pick a Context by either calling CPU() or CUDA() (Context.h:135-141) with the desired ScalarType (i.e. data type) as the argument (e.g. one of kByte, kChar, kShort, kInt, kLong, kHalf, kFloat, or kDouble). These kDataType names are just enum aliases for acceptable data types (ScalarType.h:15-22).

Then the first argument is a pointer to the data. The second argument is the size tuple, e.g. {3,3} for a 3x3 Tensor. The last (optional) argument allows you to specify a callback function that is run when the Tensor is destroyed (intended for freeing the original data). This callback function (deleter) should take a pointer as its only argument and not return anything.

atest.cpp:61:72 gives a good example of usage (modified for standalone readability):

// in namespace at::
float data[] = { 1, 2, 3,
                 4, 5, 6};
Tensor f = CPU(kFloat).tensorFromBlob(data, {1,2,3});
TensorAccessor<float,3> f_a = f.accessor<float,3>();

assert(f_a[0][0][0] == 1.0);
assert(f_a[0][1][1] == 5.0);

assert(f.strides()[0] == 6);
assert(f.strides()[1] == 3);
assert(f.strides()[2] == 1);

assert(f.sizes()[0] == 1);
assert(f.sizes()[1] == 2);
assert(f.sizes()[2] == 3);

EDIT: Actually I just noticed this was already in the ATen README.

1 Like
  1. Is there something like this for Python?

  2. Is there a way to share the same device-allocated tensor with multiple processes or threads ( /with in CUDA C++, you can share the same cudaMalloc-ed array with multiple processes using CUDA IPC API/ )?