I’m working on a unsupervised domain adaptation task, which consists of a synthetic and largely grayscale images (le’ts call this dataset A), as well real images (dataset B). The goal is to achieve an increase in performance on dataset B, by only using the labels from dataset A.
In the first step of my training process, I pre-train a resnet model on dataset A with batch norm. After pretraining, I evaluate accuracy on dataset B, and achieve an accuracy of around 50%.
Now comes the problem.
In my adaptation training code (def train()), I forward pass both the images from dataset A and dataset B in a single iteration. For example:
def train():
for dataset_A_images, dataset_A_labels, dataset_B_images in dataloader:
output_A = model(dataset_A_images)
output_B = model(dataset_B_images)
# in this fashion
I achieve a training accuracy of around 96%, which is good, but…
when I change the model to evaluation mode in my evaluation function (def evaluate()), the accuracy of the same dataset A that achieved a training accuracy of 96% in the train phase drops to 7%, with an absurdly large CE Loss.
What’s weird is that I’ve tried evaluating in train mode, and the performance still drops to less than 10%.
I’m thinking this is an issue/misuse with Batch norm, but I can’t seem to figure out why the performance drops so drastically.
To summarize, the accuracy of dataset A (which is not split into train/val) takes the following rollercoaster journey:
After pretrain: 95%
During main training code: 96%
Evaluation right after training 1 epoch: 7%
During next training epoch: 96% again.
I’ve been ripping my hair out over this issue. Thanks in advance to anyone who takes the time to read this long post
PS: I can’t share certain aspects of the code due to security reasons, but I’m happy to provide a more detailed description if the problem is not clear enough.