Can we use pre-trained word embeddings for weight initialization in nn.Embedding?

sidedishes · April 11, 2018, 7:52pm

Thanks so much. Do you know if there’s any easy way to vectorize this kind of selection operation?

jlquinn · April 13, 2018, 11:55pm

If you have a mask of the cells that should be frozen, and two full embedding matrices, one frozen and one dyamic, you could write:

dynamic = dynamic * mask + frozen

Frozen cells contain a value for frozen parameters, 0 elsewhere. Mask contains 0 wherever a frozen paramter should be used, 1 elsewhere.

You can build the mask and frozen matrix during initialization.

flauted · June 1, 2018, 4:53pm

For latecomers, see nn.Embedding.from_pretrained.

naleraphael · January 8, 2019, 3:17am

Thanks for the help from you all!

I also wrote a short snippet that shows how to load pre-trained embeddings from SpaCy to nn.Embedding.
Hope this help!

import spacy
nlp = spacy.load('en_core_web_md')
import torch
import torch.nn as nn
import numpy as np

n_vocab, vocab_dim = nlp.vocab.vectors.shape
emb = nn.Embedding(n_vocab, vocab_dim)

# Load pretrained embeddings
emb.weight.data.copy_(torch.from_numpy(nlp.vocab.vectors.data))

# --- Equivilent test for Spacy.nlp and torch.embeddings ---
test_vocab = ['apple', 'bird', 'cat', 'dog', 'egg', 'e12dsafdsf1']

# dict for converting vocab to row index for word vector matrix
key2row = nlp.vocab.vectors.key2row


for v in test_vocab:
    vocab_id = nlp.vocab.strings[v]
    spacy_vec = nlp.vocab[v].vector
    row = key2row.get(vocab_id, None)
    if row is None:
        print('{} is oov'.format(v))
        continue
    vocab_row = torch.tensor(row, dtype=torch.long)
    embed_vec = emb(vocab_row)
    print(np.allclose(spacy_vec_cat, emb_vec_cat.detach().numpy()))

Marwan_Elghitany · February 27, 2023, 7:02am

Load pre-trained GloVe embeddings

import torchtext
glove = torchtext.vocab.GloVe(name='6B', dim=300)
embedding_layer = nn.Embedding.from_pretrained(glove.vectors)