Converting GPT-2 to TorchScript

anAnnoyingNerd · March 22, 2023, 7:44pm

I’ve been having trouble converting a GPT-2 model to TorchScript. I have been able to successfully convert the model, but the data it outputs isn’t anywhere similar to the original model. For example, I converted the model to TorchScript with the sample input “A compound sentence is”. The original model outputs something like A compound sentence is a sentence that is not a sentence at all." The TorchScript model just outputs random words that have nothing to do with the input. Can somebody help me figure out how I can fix this?

ptrblck · March 23, 2023, 2:31am

I don’t know if you are using tracing with data-dependent control flow in your model or scripting, as at least the former workflow would create invalid results since conditions and code paths are baked into the model.
In any case, TorchScript is in maintenance mode and won’t get any major features anymore.
Did you try the newly introduced torch.compile mode instead?

fabian_schutze · March 23, 2023, 6:58am

In addition to what, @ptrblck said, I usually slice the models into different parts and then start with jiting the first, first and second, etc …, part to see at which stage jitting results diverge if deploying the whole model does not work immediately.

anAnnoyingNerd · March 23, 2023, 1:43pm

I’ll have to set up WSL, because Windows isn’t supported by torch.compile() yet.

anAnnoyingNerd · March 23, 2023, 1:44pm

Could you explain more?

Mr_young · December 30, 2023, 12:14pm

hi, can you show your code for converting a GPT-2 model to TorchScript?