I’m trying to deploy a model (base image:763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:2.6-gpu-py312) to SageMaker endpoint. In my model_fn()I’m loading some datasets into memory. However every time torchserve throws an error after about 2 minutes of the worker thread starting:
[ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Number or consecutive unsuccessful inference 2
[ERROR] W-9000-model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Backend worker did not respond in given time
I have tried to add a model-config.yaml file with responseTimeout: 1200
but that didn’t see to make a difference. Are there other ways to increase the torchserve worker timeouts?