Load multiple models for multi process inference

Bruce_Dong · December 17, 2022, 8:01am

here is my code ,it worked in parallel, but each inference took a hundredfold longer.

without multi process ,each inference cost 20ms,
with multi process, each inference cost 25s.

def load_model():
    conf_obj = _get_config_obj("tests/ocr/ocr.yml")
    model_manange = OCRModel(conf_obj)
    model_manange.load_model()
    model_manange.status = MODEL_MANAGE_IDLE
    return model_manange


def init_multi_load() -> None:
    for i in range(3):
        model_manage_list.append(load_model())


def infer(data: dict, model_manage_list) -> dict:
    for model_manage in model_manage_list:
        if model_manage.status == MODEL_MANAGE_IDLE:
            model_manage.status = MODEL_MANAGE_WORKING
            ret = model_manage.run_model(data)
            model_manage.status = MODEL_MANAGE_IDLE
            return ret
    raise ValueError("no idle model for service")


if __name__ == '__main__':
    torch.multiprocessing.set_start_method('spawn', force=True)
    model_manage_list = Manager().list()
    init_multi_load()
    data = {} # just for show case here
    for i in range(3):
        mp = Process(target=infer, args=(data, model_manage_list,))
        mp.start()

    while True:
        time.sleep(2)

eqy · December 18, 2022, 5:31am

What hardware is running the model? If the processes are e.g., sharing a single GPU the increased contention could slow things down

Bruce_Dong · December 18, 2022, 5:40am

i got just single GPU in my computer

Is there a way to parallelize on single GPU ? I think my GPU performance is very redundant

eqy · December 18, 2022, 5:50am

How similar are the different models on your GPU? If they are similar and have relatively simple building blocks, you might look into e.g., using batched matmul in place of linear layers torch.bmm — PyTorch 1.13 documentation and grouped convolutions in place of “vanilla” convolutions to do multiple models worth of computation at the time/per layer. Of course, you would need to be careful about keeping normalization statistics separate across the models.

Bruce_Dong · December 19, 2022, 12:32pm

Hi Dear , Exactly the same model, torch.bmm doesn’t work for me. Do you have such an example ? that load multi model in one process and inference with subprocess parallel.