I saw that is possible to use CUDA to write to memory mapped files (reference https://stackoverflow.com/questions/29518875/cuda-zero-copy-memory-memory-mapped-file )
I am wonder if it is somehow possible in Pytorch to write a cuda mounted tensor directory to a mem mapped stored on GPU.
The purpose of this is to speed up writing tensors after each training step. Currently,
with torch.no_grad():
numpyMemmap[arrayOfRandomIndexes] = u_embeddings.weight.data.detach().cpu().numpy()
takes 6 seconds. I think it’s because the numpy memory map is stored on CPU. I need something that would write in a fraction of a second since I will be storing the tensors after each training step, and there will be hundreds of thousands of training steps.