How to increase TorchServe worker timeout?

Zach_Liu · September 18, 2025, 11:45pm

I’m trying to deploy a model (base image:763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.6-gpu-py312) to SageMaker endpoint. In my model_fn()I’m loading some datasets into memory. However every time torchserve throws an error after about 2 minutes of the worker thread starting:

[ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2 
[ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error 
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time

I have tried to add a model-config.yaml file with responseTimeout: 1200but that didn’t see to make a difference. Are there other ways to increase the torchserve worker timeouts?