I see Pytorch 1.5 has brought a lot of improvements for C++ APIs. I’ve got a model that I’m using in Python. Would I still need to use Torchscript in the middle if I wanted to load this model from C++/Java? I guess so because it’s in Python and I need it to be compiled in JIT so I can load it from other languages. Please confirm this.
Also, my current concern about Torchscript is committing to a given sentence length. My model is GPT-2 and I just want to avoid padding as much as possible because I think pad tokens are still being calculated in the neural network. I’m not sure about this point though. Should I have multiple Torchscript models with different sentence lenghts? I believe having a single Torchscript model with sentences of 1024 tokens may affect performance for sentences with < 100 tokens as the majority of those tokens will be pad tokens e.g… If there’s no computation for them, then I’ll be safe. How is this concept usually being handled?