I am playing around with summarization, following this tutorial. Instead of using the cnndm dataset, I am just copying the text of a news article from the internet, so how should I format this string to be interpreted by the T5 model. This is what I have tried so far:
torch.set_default_tensor_type('torch.cuda.FloatTensor') device = torch.device("cuda") padding_idx = 0 eos_idx = 1 max_seq_len = 512 t5_sp_model_path = "https://download.pytorch.org/models/text/t5_tokenizer_base.model" transform = T5Transform( sp_model_path=t5_sp_model_path, max_seq_len=max_seq_len, eos_idx=eos_idx, padding_idx=padding_idx, ) t5_base = T5_BASE_GENERATION transform = t5_base.transform() transform = transform.to(device) model = t5_base.get_model() model.eval() model = model.to(device) sequence_generator = GenerationUtils(model) sequence_generator.device = device beam_size = 1 model_input = transform("summarize: " + text_of_article) model_output = sequence_generator.generate(model_input, eos_idx=eos_idx, num_beams=beam_size) output_text = transform.decode(model_output.tolist())
When I do this, I get the error:
AssertionError: For batched (3-D) `query`, expected `key` and `value` to be 3-D but found 2-D and 2-D tensors respectively
which I assume is because I haven’t inputted the data correctly? The tutorial mentions that the T5 model requires data to be batched, but I don’t see how my input is any different than what is in the tutorial:
batch = next(iter(cnndm_dataloader)) input_text = batch["article"] target = batch["abstract"] beam_size = 1 model_input = transform(input_text)
It looks like they are just getting the next article (prefixed with the task) from the dataloader. I am new to pytorch and nlp in general, so any help would be greatly appreciated!