I am using a test article as input and trying to generate a reasonably long summary, but I keep getting very short, single line, summaries from a very long article. I would appreciate any help.
from transformers import AutoTokenizer, \
file = open("Data/article.txt", "r")
article = file.read()
model_name = 'google/pegasus-xsum'
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer(article, max_length=510, truncation=True, return_tensors='pt').input_ids
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
outputs = model.generate(inputs, max_new_tokens=510, do_sample=False)
summary = tokenizer.decode(outputs, min_length=50, max_length=200, skip_special_tokens=True)
I learned that models ending in ‘xsum’ are designed for a single-line summary. as such I will modify to test to use a different model and post again.