How to train a neural network model with bert embeddings instead of static embeddings like glove/fasttext?

I am looking for some heads up to train a conventional neural network model with bert embeddings that are generated dynamically (BERT contextualized embeddings which generates different embeddings for the same word which when comes under different context).

In normal neural network model, we would initialize the model with glove or fasttext embeddings like,

import torch.nn as nn 

embed = nn.Embedding(vocab_size, vector_size)

embed.weight.data.copy_(some_variable_containing_vectors)

Instead of copying static vectors like this and use it for training, I want to pass every input to a BERT model and generate embedding for the words on the fly, and feed them to the model for training.

So should I work on changing the forward function in the model for incorporating those embeddings?

Any help would be appreciated!

1 Like

Take a look at the HuggingFace “transformers” repo.

This can download the pretrained Bert embeddings of your choice, and gives you a pretty straightforward interface for tokenization and extracting embeddings.

The rough outline of your code will look like this:

1 Like