modelSeq2Seq = AutoModelForSeq2SeqLM.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
so this modelseq2seq model has an extra Fully connected layer 756 to NUM tokens (lm_head) but the plain model does not output that why?
Even while in the pre-training phase there has to be a layer to convert the embeds to logits right ?
@ptrblck i hope you can help ?
I’m not an expert in HF Models but from the docs:
One can use T5ForConditionalGeneration (or the Tensorflow/Flax variant), which includes the language modeling head on top of the decoder.
it seems this model abstraction would contain the lm_head
. You might also want to cross-post this question in the HF discuss board as the devs would know exactly how these models are defined.
hey @ptrblck thanks for the response …
i will post in the HF discussion board