Every once in a while a TorchServe worker dies with the following message io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 16
. When I rerun the request in question, it completes with no problems.
I have tried to solve the issue by increasing response and request size in the config.properties
. This hasn’t helped. Responses from TorchServe are really small, around 0,5kB.
async_logging=true
max_response_size=1000000000
max_request_size=1000000000
Additional information:
I am using the docker image pytorch/torchserve:0.7.1-gpu to run a few BERT models on GPU.
Providing full TS config.properties
:
load_models=all
async_logging=true
max_response_size=1000000000
max_request_size=1000000000
models={\
"price": {\
"1.5": {\
"defaultVersion": true,\
"minWorkers": 1,\
"maxWorkers": 1,\
"batchSize": 1,\
"maxBatchDelay": 100\
}\
},\
"categorization": {\
"1.5": {\
"defaultVersion": true,\
"minWorkers": 1,\
"maxWorkers": 1,\
"batchSize": 1,\
"maxBatchDelay": 100\
}\
},\
"text_quality": {\
"1.4": {\
"defaultVersion": true,\
"minWorkers": 1,\
"maxWorkers": 1,\
"batchSize": 1,\
"maxBatchDelay": 100\
}\
},\
"text_similarity_v1": {\
"1.0": {\
"defaultVersion": true,\
"minWorkers": 1,\
"maxWorkers": 1,\
"batchSize": 16,\
"maxBatchDelay": 100\
}\
},\
"text_similarity_v2": {\
"1.0": {\
"defaultVersion": true,\
"minWorkers": 1,\
"maxWorkers": 1,\
"batchSize": 16,\
"maxBatchDelay": 100\
}\
}\
}
Providing full TorchServe error message:
2023-04-27T09:58:56.368+02:00 2023-04-27T07:58:56,360 [ERROR] epollEventLoopGroup-5-4 org.pytorch.serve.wlm.WorkerThread - Unknown exception
2023-04-27T09:58:56.368+02:00 io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 16
2023-04-27T09:58:56.368+02:00 Consider increasing the 'max_response_size' in 'config.properties' to fix.
2023-04-27T09:58:56.368+02:00 at org.pytorch.serve.util.codec.CodecUtils.readLength(CodecUtils.java:24) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at org.pytorch.serve.util.codec.CodecUtils.readMap(CodecUtils.java:54) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at org.pytorch.serve.util.codec.ModelResponseDecoder.decode(ModelResponseDecoder.java:73) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:276) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.epoll.EpollDomainSocketChannel$EpollDomainUnsafe.epollInReady(EpollDomainSocketChannel.java:138) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at java.lang.Thread.run(Thread.java:833) [?:?]
2023-04-27T09:58:56.368+02:00 2023-04-27T07:58:56,361 [INFO ] epollEventLoopGroup-5-4 org.pytorch.serve.wlm.WorkerThread - 9003 Worker disconnected. WORKER_MODEL_LOADED
2023-04-27T09:58:56.368+02:00 2023-04-27T07:58:56,361 [ERROR] epollEventLoopGroup-5-4 org.pytorch.serve.wlm.WorkerThread - Unknown exception
2023-04-27T09:58:56.368+02:00 io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 16
2023-04-27T09:58:56.368+02:00 Consider increasing the 'max_response_size' in 'config.properties' to fix.
2023-04-27T09:58:56.368+02:00 at org.pytorch.serve.util.codec.CodecUtils.readLength(CodecUtils.java:24) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at org.pytorch.serve.util.codec.CodecUtils.readMap(CodecUtils.java:54) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at org.pytorch.serve.util.codec.ModelResponseDecoder.decode(ModelResponseDecoder.java:73) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:501) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:440) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:404) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:371) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:354) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:241) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:262) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:248) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:819) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:384) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[model-server.jar:?]
2023-04-27T09:58:56.368+02:00 at java.lang.Thread.run(Thread.java:833) [?:?]