Is it possible to share CUDA.Tensor in different processes?


I want to train many models in my GPU cards.
Each model has a big fixed embedding matrix and each model is trained separatly (not train a model in multi-GPUs).

In order to place more models in one card. I have to optimize memory cost in my cards.
So I am wondering, is there a way to share the biggest CUDA.Tensor (my embedding matrix) in different processes so that there exists only one copy of this matrix in every card?

looking forward for solutions and suggestions.

How about putting the embedding operation in preprocessing operation to avoid doing it on the fly?

To be honest, it’s the last thing I want to do…
Once there is no way to share Tensors, I will preprocess the data.