Deploying multiple models that invoke each other

A_J · August 25, 2022, 1:38am

Had a paddle ocr model, converted it into pytorch model. Trying to deploy it so that I get high throughput and low latency.

Bottom line of what I have is ModelA → compute on CPU → ModelB. And I want to deploy it in the most optimal manner.

I know we can trace one model graph with aws neuron and “split” it so that its partitions run on different cores.

This got me thinking if I can hack the modelA, compute and modelB together into one graph, Does this sounds like a bad idea ?

Also if you had to deploy something like this how would you do it ? (just to constrain the problem lets assume we have to do all this in one container)

A_J · August 25, 2022, 2:07am

to be more specific here’s the code I am working with

        dt_boxes, elapse = self.text_detector(img)            # model 1
        img_crop_list = []                                    # compute
        for bno in range(len(dt_boxes)):                      # compute
            do something                                      # compute
            img_crop_list.append(something)                   # compute
        rec_res, elapse = self.text_recognizer(img_crop_list) # model 2

natalia_meira · January 10, 2025, 2:01pm

Hello,

I noticed that you successfully converted the PaddleOCR model to PyTorch. I am currently working on this as well but am encountering some difficulties.

You can find my questions and details about the issue in this discussion: ISSUE LINK.

Would you be able to assist me with this?

Thank you in advance for your help!