How to improve model latency for web deployment without retraining the models? What is the checklist that I should mark to improve the model speed?
I have multiple models that process a video sequentially on one machine with one K80 GPU; each model takes around 5 mins to process a video that is 1 min long. What ideas and suggestions should I try to improve each model latency without changing the model architecture? How should I structure my thinking about this problem?