Hello.
Performing a single model in torchserve as multi workers increases throughput.
Below is a throughput experiment based on the default_workers_per_model value and the thread setting of pythorch, MKL_NUM_THREADS value.
As far as I know, GPUs only perform one thread at a time
I would appreciate it if you could advise me why throughput increases as the default_workers_per_model increases.
number_of_netty_threads | netty_client_threads | default_workers_per_model | MKL_NUM_THREADS | job_queue_size | throughput |
---|---|---|---|---|---|
100 | 100 | 1 | 1 | 1000 | 78.1 |
100 | 100 | 1 | 2 | 1000 | 80.44166667 |
100 | 100 | 1 | 4 | 1000 | 80.65833333 |
100 | 100 | 1 | 8 | 1000 | 80.01666667 |
100 | 100 | 2 | 1 | 1000 | 128.2666667 |
100 | 100 | 2 | 2 | 1000 | 124.65 |
100 | 100 | 2 | 4 | 1000 | 126.6 |
100 | 100 | 2 | 8 | 1000 | 122.75 |
100 | 100 | 4 | 1 | 1000 | 149.2083333 |
100 | 100 | 4 | 2 | 1000 | 146.875 |
100 | 100 | 4 | 4 | 1000 | 148.3666667 |
100 | 100 | 4 | 8 | 1000 | 145.0916667 |
100 | 100 | 8 | 1 | 1000 | 148.875 |
100 | 100 | 8 | 2 | 1000 | 149.75 |
100 | 100 | 8 | 4 | 1000 | 148.6166667 |
100 | 100 | 8 | 8 | 1000 | 149.0666667 |