I am doing a binary classification and have four different datasets:
- Dataset_1: Ground truth of matching pairs (e.g. columns x, y)
- Dataset_2: Graph features of x1, x2, …, xn
- Dataset_3: Numerical features of x1, x2, …, xn
- Dataset_4: Graph features of y1, y2, …, yn
Due to the nature of the datatype, it is not feasible to join them into a single dataset.
I intend to train each of these datasets with different models, and concat the results into an ensemble model:
- Dataloader_1(Dataset_1) → Matching pairs (e.g. x10/y10)
- Dataloader_2(Dataset_2) → x10 → Model_2 → Output 2
- Dataloader_3(Dataset_3) → x10 → Model_3 → Output 3
- Dataloader_4(Dataset_4) → y10 → Model_4 → Output 4
- Output 2, 3, 4 → Model_5 → Final prediction
However, I am unsure of the following:
Positive Training Examples:
How do I get Dataloader_2, Dataloader_3, and Dataloader_4 to return the correct matching IDs to the model?
For example, if training on x5/y5 pair, Dataloader_2 and 3 should return features on x5, and Dataloader_4 should return features on y5. The purpose is to train the model to learn what pairs will match.
Negative Training Examples:
How do I get the Dataloader_2, Dataloader_3, and Dataloader_4 to return wrong matching IDs to the model?
For example, if training on x6/y6 pair, Dataloader_2 and 3 should return features on x6, but Dataloader_4 should return all other ys, except for y6. The purpose is to train the model to learn what pairs won’t match.
Thanks for your help!