Is it possible to share CUDA.Tensor in different processes?

lx865712528 · July 28, 2017, 10:43am

Hi!

I want to train many models in my GPU cards.
Each model has a big fixed embedding matrix and each model is trained separatly (not train a model in multi-GPUs).

In order to place more models in one card. I have to optimize memory cost in my cards.
So I am wondering, is there a way to share the biggest CUDA.Tensor (my embedding matrix) in different processes so that there exists only one copy of this matrix in every card?

Thanks,
looking forward for solutions and suggestions.

matthew_zeng · July 29, 2017, 4:59am

How about putting the embedding operation in preprocessing operation to avoid doing it on the fly?

lx865712528 · July 29, 2017, 6:37am

To be honest, it’s the last thing I want to do…
Once there is no way to share Tensors, I will preprocess the data.