Segfault when looking up in CUDA pointer

I’m extracting a pointer for the underlying memory on the GPU for a tensor with bytes like so:

auto opt = options_buffer = torch::TensorOptions()
                   .dtype(torch::kUInt8)
                   .device(torch::DeviceType::CUDA)
                   .memory_format(c10::MemoryFormat::Contiguous);
auto buffer = torch::zeros(size, opt);
auto *array = buffer->data_ptr<uint8_t>();

However, if I try to look up in that or do anything related to that pointer, it segfaults. But. Why? The memory should be contiguous. I’m imagining it has something to do with memory formatting and/or locks/access, but I can’t seem to find good documentation for it. I would also be grateful if you happen to know any resources that can help in this direction.

array[0] // segfaults

It seems you are trying to access device data from the host, which is UB and could segfault.

Thanks for the reply! I am indeed accessing it from the host. So, you would say this kind of access has to happen from a CUDA block?

The weird thing is that this worked in a previous installation. Did PyTorch add memory protection recently?

Yes, you need to either access the device array in a kernel or would need to copy it back to the CPU.
I don’t think this should have ever worked as it’s expected behavior in CUDA. PyTorch did not add any memory protection etc.

E.g. look at this simple example which:

  • allocates host and device memory
  • fills the host array with values
  • copies the host array to the device array via cudaMemcpy
  • launches the compute kernel, which indexes the device array
  • copies the device array back to the host
  • prints the host array via indexing it
  • frees the allocations

Now, add an invalid access e.g. via printf("%d\n", da[0]); in line 69 and you will get a Segmentation fault.

Thank you for your replies! I can see why this shouldn’t indeed work. I resolved it by writing CUDA kernels to deal with the memory allocated on the GPU.