I have been training an image captioning model and realized that I was embedding with Bert contextual embeddings. This completely screwed up training and testing. Is there anyway to get non contextual BERT embeddings?
I’ve tried this i.e. get jus first layer of BERT but I still think it captures some contextual information. My testing results are much worse than my training after 6 epochs (50,000 samples in dataset):
outputs = self.model(**input_ids, output_hidden_states=True)
# embedding comes from last hidden state of Bert embedding model used here -
emb_caption = outputs.hidden_states[0]
Solution: if you are using BERT embedding there is always going to be some form of contextual information encapsulated in the embedding. The way I solved it was embedding all words generated at each time step during testing. This gave me great captions. However, I was using a CNN-LSTM model. And even though it works, it’s more of a testament to the power of BERT than LSTM architecture so it’s kind of pointless in a way! BERT did the legwork during training leaving the LSTM to do little to no real training.