I’m working on a project that uses multiple instances of the same model on one GPU (using torch.multiprocessing). They are RNNs that process blocks of text and they do not interact with each other.
I’m using nn.Embedding and loading up the full glove vector file (about 2 million words, vector size 300) in each model. This takes up a lot of GPU memory. I’m trying to see if I can share the embedding memory between the processes on the same GPU. I understand I could limit the number of words but that is not the goal of this exercise.
Looking in tensor.py I see the share_memory_() function and it appears that the nn.Embedding is made up of tensors, but is there a way to share the entire nn.Embedding instance? Simply sharing the glove vector tensors won’t help because the nn.Embedding appears to reserve its own space.
My other option would be to not use the nn.Embedding and just convert the word indices to vectors on the CPU before being passed into the RNN but this would be slower.
One curious thing I see in share_memory_() in storage.py is this:
if self.is_cuda: pass # CUDA doesn't use POSIX shared memory
In tensor.py this is the share_memory_() function:
self.storage().share_memory_() return self
To me this looks like calling share_memory_() on the GPU does nothing. What am I missing?