Converting GPT-2 to TorchScript

I’ve been having trouble converting a GPT-2 model to TorchScript. I have been able to successfully convert the model, but the data it outputs isn’t anywhere similar to the original model. For example, I converted the model to TorchScript with the sample input “A compound sentence is”. The original model outputs something like A compound sentence is a sentence that is not a sentence at all." The TorchScript model just outputs random words that have nothing to do with the input. Can somebody help me figure out how I can fix this?

I don’t know if you are using tracing with data-dependent control flow in your model or scripting, as at least the former workflow would create invalid results since conditions and code paths are baked into the model.
In any case, TorchScript is in maintenance mode and won’t get any major features anymore.
Did you try the newly introduced torch.compile mode instead?

In addition to what, @ptrblck said, I usually slice the models into different parts and then start with jiting the first, first and second, etc …, part to see at which stage jitting results diverge if deploying the whole model does not work immediately.

I’ll have to set up WSL, because Windows isn’t supported by torch.compile() yet.

Could you explain more?