C++ seeming memory deallocation of a temporary variable

Hello!

By implementing some c++ extensions I noticed that if I transfer GPU tensor to CPU using one-liner (auto my_tensor_ptr = my_tensor.cpu().data_ptr<float>();), usually what happens is that this pointer becomes some kind of a dangling pointer and data indexed via brackets operator [] is not valid anymore. It can be easily fixed by introducing a temporary variable:

auto my_tensor_cpu = my_tensor.cpu();
auto my_tensor_ptr = my_tensor_cpu.data_ptr<float>();

Nevertheless, I sill don’t understand why does this deallocation occurs.
Any explanation will be highly appreciated.

Hi,

This is because the data returned by data_ptr<>() is only valid as long as the original Tensor exists.
But here the CPU tensor goes out of scope so the the data_ptr becomes invalid. This is expected.

1 Like

Hi,

Yes, this is clear.
But why does it cease to exist? Does it go out of its scope? What is its scope? By all means, I am not a c++ expert, but the PyTorch code is so entangled I can’t even find where the cpu method is implemented (maybe here) to check how this tensor is created.

This would actually be the same in python. If you create an object in the middle of the line. It will be gone by the end of it if you don’t reference it.
Here you create a Tensor when doing my_tensor.cpu() (let’s call it foo). Then this object is used to call .data_ptr<float>() on it. The the result of that new function (a float* ) is saved into your variable my_tensor_ptr. At this point, nothing references foo anymore. So foo is de-allocated.

2 Likes

I think it is not exactly as in Python, because in Python it may be gced or may be not but it looks like in c++ it is guaranteed(All temporary objects are destroyed as the last step in evaluating the full-expression that (lexically) contains the point where they were created) to be destroyed. So, is the object returned by cpu() method allocated in the stack? I assume if it was dynamically created, some sort of memory leak would have occurred.
Still, it feels like created cpu tensor should contain a week reference to the created pointer to prevent it from deallocation and this would allow that one-liner.

This is not possible to do. data_ptr<float>() returns just a raw pointer to float. We cannot attach any reference or anything to it.
It is the same as if you get a raw pointer to an object in cpp and then delete the object. The raw pointer never keeps the object alive.

Note that in cpython, the life of objects is refcounted and so (unless you create refcycle which are very rare), they are destroyed as soon as they are not referenced anymore.

1 Like

Hello!
If I want to access each element in a cuda float tensor, C++ code " tensors[0].data_ptr();" will return the first address, we expected *(tensors[0].data_ptr()) will return the data in the first address, but segmentation fault happens.
Any explanation will be appreciated.

Hi,

The GPU data can only be accessed from a GPU function. Regular c++ code can only access data in RAM.
You will need to write a GPU kernel to work with GPU data.

Thanks for your quick help! And could I move the GPU data to CPU using .cpu and then access each element in the tensor? I tried tensor[0].cpu().data but it failed with "tensors.c10::ArrayRef::operator[]at::Tensor(0ul).at::Tensor::cpu’ does not have class type "

You will need to use the same data_ptr<float>() call for this to work.

I tried this as well, failed with similar error.

I’m not sure you can do [0] on a c++ tensor directly. I think you have to use .index().
Doing the same thing without the indexing should work no?

Thanks. I am going to use kernel to implement and appreciate your help.