I am using a caption-generating transformer. My goal is to train a regression model on the top of the transformer. For this reason I need to have access to the latent representation of the predicted text. I am trying to extract the latent(/ vector representation) of the text while training the transformer. I will store the vector representation of the text and eventually train the regression model. But I am not able to figure out how to extract the vector representation of the final sequence. Can you please help me with that?
You can either train a nn.Embedding
layer or download a pretrained embedding layer.
Training a layer can be done either by this tutorial:
https://pytorch.org/tutorials/beginner/transformer_tutorial.html
or with Huggingface: