I tried to implement custom layers like LSoftmax (arXiv), PNN required components (arXiv) and so forth in pytorch style, but model loading and prediction work significantly slower (400-500% execution time decrease).
So, my question is: Is there any efficient way to implement custom features to make them work fast?
You should paste your code. I guess pytorch operators use cuda and they should be fast enough, however, if you are using python tools such as for loops lists and all these stuff it’s probable your code were not optimal. Developers are amazing coding with pytorch operators so they may help you to optimize.
I think you can also implement things on C but I have not experience about it