Hi Mahammad!
Okay, as I understand it, you want to train on data from scene A.
But you want your model to work on data from scenes B, C, D and E, F, G.
Yes, given your use case, this could be considered a kind of data leak.
This is an issue all the time in the real world. Letâs say you train a
self-driving-vehicle model on dirt road A, city street B, and highway C.
You really canât expect such a model to work on a bunch of other roads.
But if you train on a lot of different dirt roads and city streets and highways,
your model might work on various roads it wasnât trained on.
Clear your mind of thoughts of imbalanced data.
Your problem is that your training data is not sufficiently representative
of the data you want to apply your model to (hence leading to overfitting
issues). (Note, techniques that reduce overfitting can help your model
âgeneralizeâ to data that differs somewhat in character from the data it
was trained on. But you can only push this so far â at some point you
have to train on data that is reasonably representative of the data you
actually want to apply it to.)
Issues of data imbalance are independent of your main issue, and my
working assumption is that pos_weight
will be enough to address the
data imbalance you have.
Yes. Going back to your notion of âscenes,â letâs say that your goal is to
have your model work on data from scenes B through G. But when you
train on data from scene A, your model turns out to learn about the specific
details of scene A, rather than the general character of your actual use
case. (This effect is indeed what we call overfitting.)
Itâs fair to consider training on data from scenes B through G a data leak.
So what you need to do is train on data from other scenes, say H, I, J, K, âŚ,
that are representative of (but not the same as) the scenes you want to
apply your model to.
Your not âcheatingâ (data leak) because your model isnât learning any details
of scenes B through G. Instead, youâre not learning the specific details of
scene A (because your model is being forced to learn the shared general
character of a bunch of different scenes). As long as the âdetailsâ of your
training scenes are representative of (but not the same as) the "details* of
the scenes to which you will apply your model, you should be able to train
your model to work for your use case.
Yes â basically fewer parameters.
This was to be expected. The greater the âcapacityâ of your model (more
parameters, roughly speaking), the more capacity it will have to learn the
irrelevant details of your training data and overfit.
As a rule of thumb, if you canât fit (including overfit) your training data,
you might try a model with greater capacity, but if you overfit too easily,
you might try a model with lower capacity.
One last note: You can sometimes address overfitting by fine-tuning a
pre-trained version of your model, rather than training the model from
scratch. But this depends on having an appropriately pre-trained model.
Best.
K. Frank