Question about running a single model from a single GPU to multi workers

yunsang.ju · November 29, 2022, 12:22pm

Hello.
Performing a single model in torchserve as multi workers increases throughput.
Below is a throughput experiment based on the default_workers_per_model value and the thread setting of pythorch, MKL_NUM_THREADS value.
As far as I know, GPUs only perform one thread at a time
I would appreciate it if you could advise me why throughput increases as the default_workers_per_model increases.

number_of_netty_threads	netty_client_threads	default_workers_per_model	MKL_NUM_THREADS	job_queue_size	throughput
100	100	1	1	1000	78.1
100	100	1	2	1000	80.44166667
100	100	1	4	1000	80.65833333
100	100	1	8	1000	80.01666667
100	100	2	1	1000	128.2666667
100	100	2	2	1000	124.65
100	100	2	4	1000	126.6
100	100	2	8	1000	122.75
100	100	4	1	1000	149.2083333
100	100	4	2	1000	146.875
100	100	4	4	1000	148.3666667
100	100	4	8	1000	145.0916667
100	100	8	1	1000	148.875
100	100	8	2	1000	149.75
100	100	8	4	1000	148.6166667
100	100	8	8	1000	149.0666667