Validation in image captioning

Do we need to see the validation loss for image captioning, where a single image can have multiple ground truths?
I want to know can we expect the validation loss to reduce with epochs where a single image can have multiple ground truths like in image captioning?