How to improve accuracy through stacked generalization


I have developed two models from scratch, one with covolutions and other with vision transformers, these 2 models gave me a test accuracy of 85% and 88%. I have saved these 2 models and created an ensemble by freezing the layers, chopping of the output layers and concatenated them and added 2 dense layers and an output layer.

However, when I started training this ensemble its giving me a train and val accuracy of 100% for 2 epochs but my test accuracy dropped to 80%

I know that its overfitting, but I don’t know what else to do to improve it. Can anyone give me any suggestions on how to use those 2 models to improve the accuracy even further. I tried cross validation as well, still no improvement.

My dataset:
370 samples for train
24 samples for val
60 samples for test

Thank you