Extract key, value pairs from different documents

Assume I have an ocr engine which could detect the bounding boxes of each sentences, characters and recognize them, but the ocr engine do not know how to associate the key value pairs, how could I train a model to associate the key value pairs? Which direction I should search for? What are the recommend solutions today?

Azure has a form recognizer, the ads mentioned you can train your custom model to associate key value pairs, could it deal with forms with different layout? Sometimes keys on top, values on bottom, sometimes keys on left, values on rights, those keys and values could be everywhere. If it could, what are the solutions Azure possible using? I only need to deal with two languages at once, Japanese, English and numbers(0-9).