Randomly initialized embeddings for torchtext

Jeremy_Rutman · June 27, 2020, 12:22pm

I’d like to randomly initialize word embeddings - can I do the following:


    TEXT.build_vocab(train_data) 
    vocab_size = len(TEXT.vocab)
    embedding_vectors = torch.FloatTensor(np.random.rand(vocab_size,embedding_length))
    word_embeddings = nn.Embedding(vocab_size, embedding_length)
    word_embeddings.weight = nn.Parameter(embedding_vectors, requires_grad=True)

to do so ?
I have heard tales of a parameter for build_vocab that allows for this out of the gate but have yet to sight it myself.

ptrblck · June 28, 2020, 10:09am

This code snippet would assign embedding vectors to the nn.Embedding layer.
Note that nn.Embedding will already randomly initialize the weight parameter, but you can of course reassign it.

You could use torch.from_numpy(np.random.rand(...)).float() to avoid a copy, but your code should also work.