Question about running a single model from a single GPU to multi workers

Hello.
Performing a single model in torchserve as multi workers increases throughput.
Below is a throughput experiment based on the default_workers_per_model value and the thread setting of pythorch, MKL_NUM_THREADS value.
As far as I know, GPUs only perform one thread at a time
I would appreciate it if you could advise me why throughput increases as the default_workers_per_model increases.

number_of_netty_threads netty_client_threads default_workers_per_model MKL_NUM_THREADS job_queue_size throughput
100 100 1 1 1000 78.1
100 100 1 2 1000 80.44166667
100 100 1 4 1000 80.65833333
100 100 1 8 1000 80.01666667
100 100 2 1 1000 128.2666667
100 100 2 2 1000 124.65
100 100 2 4 1000 126.6
100 100 2 8 1000 122.75
100 100 4 1 1000 149.2083333
100 100 4 2 1000 146.875
100 100 4 4 1000 148.3666667
100 100 4 8 1000 145.0916667
100 100 8 1 1000 148.875
100 100 8 2 1000 149.75
100 100 8 4 1000 148.6166667
100 100 8 8 1000 149.0666667