Weird behavior when rying to export fp16 model to onnx


I am hoping to use a fp16 model for inference (from a model trained with fp32). I tried to use apex.amp to convert the pegasus-xum model from huggingface to fp16. At least the encoder part succeeds, but I am unable to convert it to ONNX. It prints out lots of numbers like the info below:

-0.3096 -0.0386 -0.2583 0.1227 -0.0636 -0.6738 0.1021 -0.6738
-0.4084 1.1182 -0.4387 -0.6660 -0.3435 0.5137 -1.0996 -0.6328
[ torch.cuda.HalfTensor{96103,1024} ]

But no onnx model was created in the path I specified.

However when I use the same code to convert the original model (fp32) to onnx it works. Any ideas why this happens?

Detailed code:

pegasus_model = PegasusForConditionalGeneration.from_pretrained(model_name)
pegasus_to_convert = copy.deepcopy(pegasus_model)
device = torch.device(“cuda”) if torch.cuda.is_available() else “CPU” #device = “cuda”
optimizer = AdamW(pegasus_model.parameters())
converted_o2, optimizer = apex.amp.initialize(pegasus_to_convert, optimizer, opt_level=“O2”)
#for name, param in converted_o2.named_parameters():
# print(name,
tokenizer = PegasusTokenizer.from_pretrained(model_name)
dummy_input = tokenizer(“This is an amazing sentence.”, return_tensors=‘pt’).to(device)
converted_encoder = converted_o2.model.encoder
ori_encoder =
output_converted_encoder_path = “xx”
print(“Exporting to ONNX:”)
//If we export ori_encoder this works
export_encoder(ori_encoder, dummy_input[‘input_ids’], output_converted_encoder_path)
// if we export the converted encoder, it does not work.
export_encoder(converted_encoder, dummy_input[‘input_ids’], output_converted_encoder_path)

The pytorch I use is version 1.10.1+cu113. The GPU is a10g with cuda version 11.2.