Segmentation Fault while element access with data_ptr on GPU


I am running into problems I am not able to comprehend. Maybe you can help me out.

I am working on a C++ extension and reimplementing some functions for performance.
In one of them I am trying to iterate through a 1-dimensional tensor and access each element in it.

torch::Tensor unpack(const torch::Tensor &self) {
    TORCH_CHECK(self.dim() == 1, "Only one dimensional tensor allowed");
    TORCH_CHECK(self.is_contiguous(), "Tensor has to be conitguous");
    TORCH_CHECK(self.dtype() == torch::kInt64, "Tensor has to be long type");

    // Get pointer to data
    auto *self_ptr = self.data_ptr<int64_t>();

   // Access data
   // Crashed with GPU
    std::cout << *self_ptr << "\n"; // index 0 
    std::cout << *self_ptr++ << "\n"; // index 1


Sample input:

auto opts = torch::TensorOptions().dtype(torch::kInt64);
auto tensor = torch::ones({4}, opts);


This all works fine when the input tensor is on the CPU, BUT if I try to pass a tensor that is on the GPU I am running into a segmentation fault. Note that the tensor is in the end passed from the python program, if that is relevant.


I tested with moving the tensor back on the cpu and this works. Which is a workaround…
Is there a way of accessing the data of the GPU tensor without the overhead off moving it?

In hope someone can help me out here.


You won’t be able to access device memory on the host via a direct pointer dereference and would either need to move the data back to the host or access the memory in a CUDA kernel.