Torchserve workflow is stuck for more than 14 python async requests

Hi, I am running torchserve container with pipeline and 2 models.
if I am sending python async requests to the pipeline, for more than 14 requests torchserve got stuck for long time and than fail.
But if I am sending requests to one of the models with python async requests it works fine even for 1000 requests.
The error that appears is Number or consecutive unsuccessful inference 1
I’m thinking that some of the configuration is wrong.
Can someone lead me to the solution? :slight_smile:
(It’s on machine with 1 GPU)

 async def execute_model(self, model_name, data):
        post_url = f"http://{self.ip}:{self.infer_port}/predictions/{model_name}"
        async with aiohttp.ClientSession() as session:
            async with session.post(post_url, json=data) as response:
                try:
                    response_data = await response.text()
                    if response.status != 200:
                         return response_data
                    if response.status != 200:
                        print(response.status)

                    # print(response)
                except aiohttp.ContentTypeError:
                    print("Unexpected response content type:", response.headers.get("Content-Type"))
                    print("Response body:", await response.text())
tasks = [p.execute_model('detector', data) for _ in range(request_number)]
await asyncio.gather(*tasks)
  async def execute_pipeline_async(self, data, pipeline_name):
      post_url = f"http://{self.ip}:{self.infer_port}/wfpredict/{pipeline_name}"
      async with aiohttp.ClientSession() as session:
          async with session.post(post_url, json=data) as response:
              response_data = await response.text()
              if response.status == 200:
                  return response_data
              if response.status != 200:
                  print(response.status)
              return response_data
 tasks = [p.execute_pipeline_async(data, pipeline_name) for _ in range(request_number)]
 a = await asyncio.gather(*tasks)

detector_embedder_pipeline.yaml

models:
    min-workers: 1
    max-workers: 1
    batch-size: 4
    max-batch-delay: 100
    retry-attempts: 4
    timeout-ms: 300000

    detector:
      url: detector.mar

    embedder:
      url: embedder.mar

dag:
  detector: [prep_intermediate_input]
  prep_intermediate_input: [embedder]
  embedder: [post_processing]

detector_embedder_pipeline.py

import json
import logging


def post_processing(data, context):
    '''
    Changes the output keys obtained from the individual model
    to be more appropriate for the workflow output
    '''
    logging.info("Start post_processing for pipeline...")

    processed_data = []
    if data:
        for output in data:
            if isinstance(output, list):
                output = output[0]
            output_message = output.get("data") or output.get("body")
            output_message = json.loads(output_message)
            processed_data.append(output_message)
    logging.info("Finish post_processing for pipeline...")

    return processed_data


def prep_intermediate_input(messages, context):
    '''
    Extracts only the translated text from the output of the first model
    and converts it into the string that is expected by the second model
    '''
    logging.info("Start prep_intermediate_input...")

    processed_data = []
    if messages:
        for row in messages:
            data = row.get("data") or row.get("body")
            if isinstance(data, str):
                try:
                    # Attempt to parse JSON
                    data_dict = json.loads(data)
                except json.JSONDecodeError as e:
                    print("Error decoding JSON:", e)
                    continue
            elif isinstance(data, bytearray):
                # If it's a bytearray, decode it to a string assuming it's UTF-8 encoded
                try:
                    data_str = data.decode('utf-8')
                except UnicodeDecodeError as e:
                    print("Error decoding bytearray:", e)
                    continue
                try:
                    data_dict = json.loads(data_str)
                except json.JSONDecodeError as e:
                    print("Error decoding JSON from bytearray:", e)
                    continue
            elif isinstance(data, dict):
                # If it's already a dictionary, no need to convert
                data_dict = data
            else:
                print("Invalid data format:", type(data))
                continue
            # Check if 'image_pixels' key exists in the data dictionary
            try:
                detection_obj = {
                    "image_name": data_dict[0]['image_name'],
                    "detections": [
                        {
                            "bbox": data_dict[0]['bbox'],
                            "kps": data_dict[0]['kps'],
                            "det_score": data_dict[0]['det_score']
                        }
                    ]
                }
                processed_data.append(detection_obj)
            except KeyError as e:
                print("KeyError:", e)
            logging.info("Finish prep_intermediate_input...")

            return processed_data

config.properties

inference_address=http://0.0.0.0:8087
management_address=http://0.0.0.0:8088
metrics_address=http://0.0.0.0:8089
number_of_netty_threads=32
job_queue_size=1000
model_store=/workspace/model-store
workflow_store=/workspace/wf-store
default_workers_per_model=8

torchserve logs

2024-04-03T08:44:37,860 [INFO ] epollEventLoopGroup-3-20 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@7483f41b
2024-04-03T08:44:37,860 [INFO ] epollEventLoopGroup-3-18 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@1434c728
2024-04-03T08:44:37,860 [INFO ] epollEventLoopGroup-3-9 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@533790ae
2024-04-03T08:44:37,860 [INFO ] epollEventLoopGroup-3-5 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@67651088
2024-04-03T08:44:37,860 [INFO ] epollEventLoopGroup-3-8 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@17e8cb1e
2024-04-03T08:44:37,872 [INFO ] epollEventLoopGroup-3-10 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@87d51a
2024-04-03T08:44:37,872 [INFO ] epollEventLoopGroup-3-4 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@5242ec65
2024-04-03T08:44:37,860 [INFO ] epollEventLoopGroup-3-17 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@147d04fd
2024-04-03T08:44:37,860 [INFO ] epollEventLoopGroup-3-7 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@30ab9b00
2024-04-03T08:44:37,860 [INFO ] epollEventLoopGroup-3-13 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@79472958
2024-04-03T08:44:37,872 [INFO ] epollEventLoopGroup-3-11 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@681f4290
2024-04-03T08:44:37,872 [INFO ] epollEventLoopGroup-3-16 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@6c338e33
2024-04-03T08:44:37,872 [INFO ] epollEventLoopGroup-3-22 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@642c37b4
2024-04-03T08:44:37,872 [INFO ] epollEventLoopGroup-3-6 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@d014a21
2024-04-03T08:44:37,872 [INFO ] epollEventLoopGroup-3-12 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@455c75a1
2024-04-03T08:44:37,872 [INFO ] epollEventLoopGroup-3-14 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@7a9f86f0
2024-04-03T08:44:37,860 [INFO ] epollEventLoopGroup-3-19 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@498faf3b
2024-04-03T08:44:37,872 [INFO ] epollEventLoopGroup-3-15 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@2ad14e76
2024-04-03T08:44:37,873 [INFO ] epollEventLoopGroup-3-21 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@2ccf1e9b
2024-04-03T08:44:37,878 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,879 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,879 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,880 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,880 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,880 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,880 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,880 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,881 [DEBUG] W-29502-detector_embedder_pipeline__detector_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1712133877880
2024-04-03T08:44:37,881 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,881 [INFO ] W-29502-detector_embedder_pipeline__detector_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1712133877881
2024-04-03T08:44:37,881 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,881 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,882 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,882 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,882 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,883 [INFO ] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Invoking -  detector for attempt 0
2024-04-03T08:44:37,887 [INFO ] epollEventLoopGroup-3-23 org.pytorch.serve.http.api.rest.InferenceRequestHandler - org.pytorch.serve.util.messages.RequestInput@670ebacc
2024-04-03T08:44:43,780 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133883
2024-04-03T08:44:43,780 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:205.1998405456543|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133883
2024-04-03T08:44:43,780 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:238.3367919921875|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133883
2024-04-03T08:44:43,780 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:53.7|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133883
2024-04-03T08:44:43,780 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:32.3486328125|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712133883
2024-04-03T08:44:43,781 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:1325.0|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712133883
2024-04-03T08:44:43,781 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:3.0|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712133883
2024-04-03T08:44:43,781 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:4122.87890625|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133883
2024-04-03T08:44:43,781 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:10368.92578125|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133883
2024-04-03T08:44:43,781 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:73.8|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133883

root@caddc5b40c7b:/workspace# 2024-04-03T08:45:43,610 [INFO ] pool-3-thread-2 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:45:43,611 [INFO ] pool-3-thread-2 TS_METRICS - DiskAvailable.Gigabytes:205.19893264770508|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:45:43,611 [INFO ] pool-3-thread-2 TS_METRICS - DiskUsage.Gigabytes:238.33769989013672|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:45:43,611 [INFO ] pool-3-thread-2 TS_METRICS - DiskUtilization.Percent:53.7|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:45:43,611 [INFO ] pool-3-thread-2 TS_METRICS - GPUMemoryUtilization.Percent:32.3486328125|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:45:43,611 [INFO ] pool-3-thread-2 TS_METRICS - GPUMemoryUsed.Megabytes:1325.0|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:45:43,611 [INFO ] pool-3-thread-2 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:45:43,611 [INFO ] pool-3-thread-2 TS_METRICS - MemoryAvailable.Megabytes:4304.66796875|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:45:43,611 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUsed.Megabytes:10367.53125|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:45:43,612 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUtilization.Percent:72.6|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712133943
2024-04-03T08:46:43,608 [INFO ] pool-3-thread-2 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:46:43,608 [INFO ] pool-3-thread-2 TS_METRICS - DiskAvailable.Gigabytes:205.1987533569336|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:46:43,608 [INFO ] pool-3-thread-2 TS_METRICS - DiskUsage.Gigabytes:238.3378791809082|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:46:43,608 [INFO ] pool-3-thread-2 TS_METRICS - DiskUtilization.Percent:53.7|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:46:43,608 [INFO ] pool-3-thread-2 TS_METRICS - GPUMemoryUtilization.Percent:32.3486328125|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:46:43,608 [INFO ] pool-3-thread-2 TS_METRICS - GPUMemoryUsed.Megabytes:1325.0|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:46:43,609 [INFO ] pool-3-thread-2 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:46:43,609 [INFO ] pool-3-thread-2 TS_METRICS - MemoryAvailable.Megabytes:4333.3984375|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:46:43,609 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUsed.Megabytes:10362.09375|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:46:43,609 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUtilization.Percent:72.4|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134003
2024-04-03T08:47:43,610 [INFO ] pool-3-thread-2 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:43,610 [INFO ] pool-3-thread-2 TS_METRICS - DiskAvailable.Gigabytes:205.19869995117188|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:43,610 [INFO ] pool-3-thread-2 TS_METRICS - DiskUsage.Gigabytes:238.33793258666992|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:43,611 [INFO ] pool-3-thread-2 TS_METRICS - DiskUtilization.Percent:53.7|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:43,611 [INFO ] pool-3-thread-2 TS_METRICS - GPUMemoryUtilization.Percent:32.3486328125|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:43,611 [INFO ] pool-3-thread-2 TS_METRICS - GPUMemoryUsed.Megabytes:1325.0|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:43,611 [INFO ] pool-3-thread-2 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:43,611 [INFO ] pool-3-thread-2 TS_METRICS - MemoryAvailable.Megabytes:4333.3046875|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:43,611 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUsed.Megabytes:10366.5390625|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:43,612 [INFO ] pool-3-thread-2 TS_METRICS - MemoryUtilization.Percent:72.4|#Level:Host|#hostname:caddc5b40c7b,timestamp:1712134063
2024-04-03T08:47:57,882 [ERROR] W-29502-detector_embedder_pipeline__detector_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 1
2024-04-03T08:47:57,882 [ERROR] W-29502-detector_embedder_pipeline__detector_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
	at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:242) [model-server.jar:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
2024-04-03T08:47:57,902 [DEBUG] W-29502-detector_embedder_pipeline__detector_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 2086579, Inference time ns: 200023664831
2024-04-03T08:47:57,903 [INFO ] W-29502-detector_embedder_pipeline__detector_1.0-stdout MODEL_LOG - Frontend disconnected.
2024-04-03T08:47:57,903 [ERROR] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - org.pytorch.serve.http.InternalServerException: Worker died.
2024-04-03T08:47:57,903 [DEBUG] W-29502-detector_embedder_pipeline__detector_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 853103, Inference time ns: 200023230958
2024-04-03T08:47:57,903 [ERROR] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - org.pytorch.serve.http.InternalServerException: Worker died.
2024-04-03T08:47:57,903 [ERROR] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - org.pytorch.serve.http.InternalServerException: Worker died.
2024-04-03T08:47:57,903 [ERROR] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Timed out while executing detector for attempt 0
2024-04-03T08:47:57,903 [DEBUG] W-29502-detector_embedder_pipeline__detector_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 586418, Inference time ns: 200023235250
2024-04-03T08:47:57,903 [ERROR] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - Timed out while executing detector for attempt 0
2024-04-03T08:47:57,903 [INFO ] epollEventLoopGroup-5-5 org.pytorch.serve.wlm.WorkerThread - 29502 Worker disconnected. WORKER_MODEL_LOADED
2024-04-03T08:47:57,903 [DEBUG] W-29502-detector_embedder_pipeline__detector_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 219579, Inference time ns: 200023277371
2024-04-03T08:47:57,903 [ERROR] wf-execute-thread-0 org.pytorch.serve.ensemble.DagExecutor - org.pytorch.serve.http.InternalServerException: Worker died.