Dynamic Quantization of finetuned T5 model

Hi All,

I tried quantizing a finetuned T5 model :

model = T5ForConditionalGeneration.from_pretrained('/path/to/finetuned/model')
model = torch.quantization.quantize_dynamic(model, {torch.nn.Linear}, dtype=torch.qint8)

tokenizer = T5Tokenizer.from_pretrained('t5-base')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)

list = ["What is you name ?",
"Where do you live ?",
"You should say hi and greet him.",
"If John has cash 80 dollars then tell him to transfer",
"Tell John to buy some vegetables",]

l = []
t = []

for s in list:

	start = time.perf_counter();

	sentence = s

	text =  "paraphrase: " + sentence + " </s>"

	max_len = 256

	encoding = tokenizer.encode_plus(text,pad_to_max_length=True, return_tensors="pt")
	input_ids, attention_masks = encoding["input_ids"].to(device), encoding["attention_mask"].to(device)


	# set top_k = 50 and set top_p = 0.95 and num_return_sequences = 3
	beam_outputs = model.generate(
	    input_ids=input_ids, attention_mask=attention_masks,
	    do_sample=True,
	    max_length=256,
	    top_k=120,
	    top_p=0.98,
	    early_stopping=True,
	    num_return_sequences=1
	)
 
 
 	l.append(tokenizer.decode(beam_outputs[0],skip_special_tokens=True,clean_up_tokenization_spaces=True))
    t.append("time taken = {}".format(time.perf_counter()-start))

	
for i in l:
	print(i)

for j in t:
	print(j)

While getting inferences it says:

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

and I am getting empty string as inference. What is wrong?