To get the address of the first element of a tensor one can call the method:
Tensor.data_ptr
I’m wondering if it would be possible to create a Tensor by knowing the device, the result obtained from Tensor.data_ptr and the Tensor shape.
To get the address of the first element of a tensor one can call the method:
Tensor.data_ptr
I’m wondering if it would be possible to create a Tensor by knowing the device, the result obtained from Tensor.data_ptr and the Tensor shape.
I had the same question.
I looked at some of the functions in Type.h (e.g. Type.h:114) and tried to find their usage in the test cases (e.g. atest.cpp:61) to figure out how to use them.
In particular this function looked useful (Type.h:114):
Tensor tensorFromBlob(void * data,
IntList sizes,
const std::function<void(void*)> & deleter=noop_deleter) const;
This is a member function of the Type
class.
To make a Tensor
with it, first pick a Context
by either calling CPU()
or CUDA()
(Context.h:135-141) with the desired ScalarType
(i.e. data type) as the argument (e.g. one of kByte
, kChar
, kShort
, kInt
, kLong
, kHalf
, kFloat
, or kDouble
). These kDataType
names are just enum
aliases for acceptable data types (ScalarType.h:15-22).
Then the first argument is a pointer to the data. The second argument is the size tuple, e.g. {3,3}
for a 3x3 Tensor
. The last (optional) argument allows you to specify a callback function that is run when the Tensor
is destroyed (intended for freeing the original data). This callback function (deleter
) should take a pointer as its only argument and not return anything.
atest.cpp:61:72 gives a good example of usage (modified for standalone readability):
// in namespace at::
float data[] = { 1, 2, 3,
4, 5, 6};
Tensor f = CPU(kFloat).tensorFromBlob(data, {1,2,3});
TensorAccessor<float,3> f_a = f.accessor<float,3>();
assert(f_a[0][0][0] == 1.0);
assert(f_a[0][1][1] == 5.0);
assert(f.strides()[0] == 6);
assert(f.strides()[1] == 3);
assert(f.strides()[2] == 1);
assert(f.sizes()[0] == 1);
assert(f.sizes()[1] == 2);
assert(f.sizes()[2] == 3);
EDIT: Actually I just noticed this was already in the ATen README.
Is there something like this for Python?
Is there a way to share the same device-allocated tensor with multiple processes or threads ( /with in CUDA C++, you can share the same cudaMalloc-ed array with multiple processes using CUDA IPC API/ )?