I am currently trying to perform multi-threaded inference on some edge devices, and I have encountered several issues. Can someone please guide me on how to make improvements?
My Question:
1、When using multi-threaded reasoning, it is possible for multiple threads to return reasoning results in a very short amount of time, and it is also possible for multiple threads to return results without a single thread for a long period of time. How can I make my reasoning timeline smoother
2、When I use multi-threaded inference, later threads may actually have faster inference speed than previous threads. If I don’t use these frames, my video fps may decrease. If I use them, my video stream order may become chaotic
3、When I am multithreaded reasoning, the CPU consumption speed sharply increases, but it seems that my average reasoning speed has not improved at the same level. Perhaps I have 3 threads reasoning a video with an FPS of 22, while the reasoning speed of 18 threads can only reach 25. If I observe the thread pool, I can also see that all 18 threads are enabled. How should I handle this problem
4、When I am conducting multi-threaded inference, should I create multiple models and use different models for each thread, or should I use a single model to the end