Implementing custom .to function

rbenke · October 17, 2024, 2:28pm

Hi, I created my own backend ( based on privateuse1 ) and as of now Im using CPU to test some functionalities.

Currently I stuggle to implement _to_copy for that device so that it does not make a copy of the data ( underlaying storage ) but only change device property ( create a new tensor with the same underlayng storage but different device ).

So far I realized I need to ‘change’ the device in three places: key_set, device_opt_ and storage_.storage_impl_.data_ptr_.device_. Is there a way I can do it? Did I missed something?

I was able to change the last one using unsafeGetStorageImpl()->data()->unsafe_set_device(new_device). Is there somwhere similar setter for device_opt_ as well?

rbenke · October 25, 2024, 2:25pm

I wrote something like this:

auto storage = self.storage(); 
auto new_data_pt = at::DataPtr(
    storage.data_ptr().get(),
    storage.data_ptr().get_context(),
    storage.data_ptr().get_deleter(),
    device.value()
);

storage.mutable_data_ptr().release_context();
storage.set_data_ptr(std::move(new_data_pt));
c10::intrusive_ptr<at::TensorImpl> result_impl = 
c10::make_intrusive<at::TensorImpl>( 
    c10::Storage(storage),
    key_set,
    self.dtype()
); 
result_impl->set_sizes_and_strides(self.sizes(), self.strides()); 
auto result = at::Tensor(result_impl);

But I would rather prefer to have a unique pointer to the data instead of changing the ownership ( by doing that this code will crash when the old tensor outlive the new one).

ups · December 21, 2024, 5:31am

Take a look at: FBGEMM/fbgemm_gpu/src/memory_utils/memory_utils.cu at fe980ab54a6e28818d81c8694b6564e7f804418b · pytorch/FBGEMM · GitHub
Cuda UVM memory can also be accessed from both cpu and cuda device and adding a storage indirection solved the reference counting issue.