Custom loss using an NLP metric after OCR

Hi am working on a problem where we’re given images of text that may be faded, or is generally lower resolution and the task is to OCR the image. I am exploring the idea of using something like SRGAN to improve the quality of the image, and I would like to add a loss component that OCRs the generated image, compares the output to GT output and use a distance metric (edit-distance etc) to provide some notion of how good the image is from an OCR point of view. Does this make sense? How would I do something like this?