PyTorch nn.Embedding takes too much memory

Larisa_Biriescu · August 22, 2022, 8:19am

Hello, I’m implementing a deep neural network and for that I use torch Embedding in order to encode the input data. The problem is that the input id’s go up to 120 000 and when I ran the model, it requires me a lot of GB RAM. Do you have any idea how can i solve this problem?

nivek · August 22, 2022, 5:06pm

How are you loading your data? Is the entirety of the input data loaded into memory ahead of time?

If so, you can consider loading them lazily or use Iterable-style Dataset or IterDataPipe from TorchData.

ptrblck · August 23, 2022, 12:50am

It seems you are concerned about the size of this particular embedding layer.
I don’t know how large the embedding_dim is, but for num_embeddings=120000 and embedding_dim=1024, you would only use ~470MB:

print(torch.cuda.memory_allocated()/1024**2)
# 0.0

emb = nn.Embedding(120000, 1024).cuda()
print(torch.cuda.memory_allocated()/1024**2)
# 468.75

Larisa_Biriescu · August 29, 2022, 1:02pm

Thank you guys for your answers! I figured out where the problem was located: in some place a defined 120 000 *10 000 embedding layer - that generated the OOM error