Comparing two DL models across various datasets

anilkunchalaece · June 13, 2021, 10:30am

I’m testing two DL models M1, M2 across datasets D1,D2,D3 and D4.

When using D1 and D4, M2 is showing a better performance but for both D2 and D3 there is no performance improvement.

Is this normal ?
Or Do I have to change my architecture So M2 should always produce better performance compared to M1 for all datasets ?