Some partial answers:
-
Depending what you mean by “feature vectors”, you might either access the
.weight
attribute or use forward hooks to get the activation output of this layer as described here. -
Yes, you could store the features and train another model with it. Based on this use case, you would refer to the output activations as features. Otherwise, if you are referring to the
.weight
parameter, you would only have a single sample and training wouldn’t make sense. -
The embeddings would be trained as part of the complete model for your current use case, e.g. a classification. I’m not sure how you would like to train only the embeddings.
-
I don’t know what the best approach would be for missing tokens.