Using autoencoder and classifier to reduce crossdataset difference of CT-Scans

Hello there,
In my recent research i tried to minimize the corssdataset difference of CT-Scans made by different CT-Maschines in different countries with the goal of making the data perform better when used together to train an Classifier.
For this i tried using an classic VGG16 Autoencoder and an imaging technoloy classifier to train this autoencoder before actually training the medical Classifier.
Our result are sadly quite unstable and we can not achieve good performance with same and cross dataset images at the same time.
Is the approach generaly flawed or do you have some recommondations and tipps ?