I’m testing two DL models M1, M2 across datasets D1,D2,D3 and D4.
When using D1 and D4, M2 is showing a better performance but for both D2 and D3 there is no performance improvement.
- Is this normal ?
- Or Do I have to change my architecture So M2 should always produce better performance compared to M1 for all datasets ?