I see Pytorch 1.5 has brought a lot of improvements for C++ APIs. I’ve got a model that I’m using in Python. Would I still need to use Torchscript in the middle if I wanted to load this model from C++/Java? I guess so because it’s in Python and I need it to be compiled in JIT so I can load it from other languages. Please confirm this.
Also, my current concern about Torchscript is committing to a given sentence length. My model is GPT-2 and I just want to avoid padding as much as possible because I think pad tokens are still being calculated in the neural network. I’m not sure about this point though. Should I have multiple Torchscript models with different sentence lenghts? I believe having a single Torchscript model with sentences of 1024 tokens may affect performance for sentences with < 100 tokens as the majority of those tokens will be pad tokens e.g… If there’s no computation for them, then I’ll be safe. How is this concept usually being handled?
But if you use torchscript, it should just behave like the python version of your model. So you just need to make sure not to do extra padding on the python side.