I’m extracting a pointer for the underlying memory on the GPU for a tensor with bytes like so:
auto opt = options_buffer = torch::TensorOptions()
.dtype(torch::kUInt8)
.device(torch::DeviceType::CUDA)
.memory_format(c10::MemoryFormat::Contiguous);
auto buffer = torch::zeros(size, opt);
auto *array = buffer->data_ptr<uint8_t>();
However, if I try to look up in that or do anything related to that pointer, it segfaults. But. Why? The memory should be contiguous. I’m imagining it has something to do with memory formatting and/or locks/access, but I can’t seem to find good documentation for it. I would also be grateful if you happen to know any resources that can help in this direction.
Yes, you need to either access the device array in a kernel or would need to copy it back to the CPU.
I don’t think this should have ever worked as it’s expected behavior in CUDA. PyTorch did not add any memory protection etc.
Thank you for your replies! I can see why this shouldn’t indeed work. I resolved it by writing CUDA kernels to deal with the memory allocated on the GPU.