Different performance of model on simulated and experimental data

I’m working on image denoising task. And I have trained 2-3 models till now. All of them perform great while denoising simulated noisy images but when it comes to images from experimental setup, it fails to achieve the results.
Can anyone help me understand why this might be the case?
P.S I preprocess both the images using the exact same pipeline.

Often “real world” cases fail, if the image statistics are not equal, e.g. if the modality to capture these images changes in some way (e.g. web images for training vs. smartphone images for the test case).

I’m not sure, if I understand the use case correctly, but I assume you’ve added the noise to the training images manually, while the test images are already noisy?
If that’s the case, did you measure the noise distribution/statistics of the test images and tried to apply the same noise on the training images?