Trouble deploying Yolov5 model in Torch Serve environment

I am trying to deploy the Yolov5 model in the Torch Serve environment but on sending the below request to the endpoint I am not getting any response, the request seems to get stuck

$ curl http://127.0.0.1:8080/predictions/yolov5 -T /home/atinesh/Desktop/COCO_val2014_000000562557.jpg

Command to compile and deploy the model

Starting container

$ sudo docker run -d --rm -it \
-p 8080:8080 -p 8081:8081 -p 8082:8082 \
--name torchserve-cpu \
-v $(pwd)/model-server/model-store:/home/model-server/model-store \
-v $(pwd)/model-server/examples:/home/model-server/examples \
pytorch/torchserve:latest

Logging in the container and compiling .mar file using Torch Model Archiver

$ sudo docker exec -u root -it torchserve-cpu /bin/bash

root@<container-id>:<path># torch-model-archiver --model-name yolov5 \
--version 1.0 \
--serialized-file /home/model-server/examples/object_detector/yolov5/yolov5x.pt \
--handler /home/model-server/examples/object_detector/yolov5/torchserve_handler.py \
--export-path /home/model-server/model-store \
--extra-files /home/model-server/examples/object_detector/yolov5/torchserve_handler.py

Registering the model
$ curl -X POST "http://localhost:8081/models?url=/home/atinesh/model-server/model-store/yolov5.mar"

Scaling the worker
$ curl -v -X PUT "http://localhost:8081/models/yolov5?min_worker=1"

Deployed model is running properly, on checking model health I am getting below response

$ curl "http://localhost:8081/models/yolov5"
[
  {
    "modelName": "yolov5",
    "modelVersion": "1.0",
    "modelUrl": "/home/atinesh/model-server/model-store/yolov5.mar",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 1,
    "maxBatchDelay": 100,
    "loadedAtStartup": false,
    "workers": [
      {
        "id": "9001",
        "startTime": "2021-11-11T05:09:57.190Z",
        "status": "READY",
        "memoryUsage": 0,
        "pid": 507,
        "gpu": false,
        "gpuUsage": "N/A"
      }
    ]
  }
]

Other information

Yolov5 (XLarge) model is trained on custom COCO dataset to detect 2 objects person & bicycle, below is the link of the trained model file
yolov5x.pt

Image used for Inference: COCO_val2014_000000562557.jpg

I have tried two different handler files handler #1, handler #2 but the same issue persists.

Hi @atinesh do you mind taking a look at the logs/model_log.log file and see if that helps debug the issue.

Some guesses as to the issue

My first thought was that inference time could be really slow but looks like model is ~150MB so should be fine, does this stuckness persist without torchserve? Can you try removing the batch delay and increasing the number of workers

Finally if you’re doing CPU inference it really helps to set torch.set_num_threads(1) when initializing a model for infererence in your handler otherwise the inference can take a while. I’d also check whether the preprocessing is taking a long time so just some general profiling would help as well