Validation accuracy reaches to 1.0 on the first epoch, so weird

Hi guys, I use resnet to do three binary classification tasks: ADvsNC, MCIvsNC, and ADvsMCI. NC, MCI and AD are normal patients, slight and severe diseases, respectively. For ADvsNC and MCIvsNC, their training processes are reasonable. The training and validation loss reduces and accuracy increases after some epochs. However, for ADvsMCI, the training and validation loss is almost 0 and the accuracy is almost 1.0 at the first epoch. The results are wired, because ADvsMCI is a harder task compared with ADvsNC. It shouldn’t get the best result, especially on the first epoch!

I used the same codes for the three tasks. This means the code has no problems because it works well on the other two tasks. So this means the problem may happen on the dataset.

So I checked the data. I copied the AD from the ADvsNC and MCI from the MCIvsNC to form ADvsMCI. Firstly, I thought maybe I mixed AD with NC and MCI with NC, so I tried copy NC from ADvsNC and NC from MCIvsNC as AD and MCI, respectively. The training processing is still wired as before. Now I have no idea about this problem. Could anyone give me some hands? Thank you very much!

PS: I used the trained model to evaluate the training, test, and vald of ADvsMCI, the accuracy is not 1.0.

If I do not make sense, pls let me know

It sounds like a data related problem.

I don’t really understand the following statement

PS: I used the trained model to evaluate the training, test, and vald of ADvsMCI, the accuracy is not 1.0.

Did you check the training, validation and test errors on ADvsMCI after the shuffling described in the paragraph before?

Can you load the dataset completely into RAM?
If so, it is a bit tedious, but you could check for re-occurences of training instances in the validation and test set.

Thanks for your reply @ptrblck!

For the statement,

I mean, I saved the best model during the training process in a file first (torch.save). I then loaded the best model from the file (torch.load) to evaluate the training, validation and, test data again, but the validation, and test accuracy could not reach to 1.0. This is also wired. It seems the best model is not trained, evaluated and tested on the same dataset, but I use the same data paths for the training, validation and test data.

What is the meaning of checking the training, validation, and test errors after the shuffling? I just do shuffling for the training data. Do you mean calculate the error after each epoch?

The dataset is kinda big. It cann’t be loaded into RAM completely. I also checked the training, validation and test data. There is no overlapping among them.

Maybe I’m a bit slow today, but let’s clarify some aspects. :wink:

So during the training of ADvsMCI you model reaches an accuracy of 1.0 for the training and validation set. How is the accuracy for the test set (before saving and loading the model again)?
Are you calling model.eval() before calculating the validation and test accuracy? Did you call model.train() before the next training step?

After saving and loading the same model (which had an accuracy of 1.0 for ADvsMCI), it cannot reach the same accuracy anymore? Neither for the training set, nor for the validation and test set?

If I understood it correctly, something “changed” after you are loading the model, am I right?

With “shuffling” I meant the reassigning of the data sets:

I copied the AD from the ADvsNC and MCI from the MCIvsNC to form ADvsMCI. Firstly, I thought maybe I mixed AD with NC and MCI with NC, so I tried copy NC from ADvsNC and NC from MCIvsNC as AD and MCI, respectively. The training processing is still wired as before. Now I have no idea about this problem.

Could you provide a code snippet with your implementation and with smaller datasets?

I just checked the training records and found I didn’t evaluate the test set before saving and loading the model again. I will check it. But after the best model was loaded, the accuracy values of the training, validation and test sets were much low, just
0.3614, 0.3547, and 0.3547.

I didn’t use the model.eval(), but the model.train(False) before evaluation. I then set model.train(True) before the next training step.

After saving and then loading the best model (which had an accuracy of 1.0 for ADvsMCI), all the accuracy values cannot reach 1.0 again. They all are less than 0.4.

Something may be changed after I load the model, but I do not have any sense about the change. Besides, the training process is also wired. The accuracy values should not reach 1.0 at the first epoch. How do you think?

These are the training codes

The following is the evaluation codes

Thank you @ptrblck

I don’t see any obvious errors in the code.
Is the training and validation accuracy staying at 1.0 for ADvsMCI or is it peaking at the first epoch?
Do you train the same net on these different data sets sequentially or are you using a new model for each dataset?

Is the accuracy for training/validation using the other data sets comparable to the accuracies after loading the model again?

Sorry for my late reply @ptrblck .

Yes, it always stays at 1.0 for ADvsMCI. I trained the same net for ADvsNC and ADvsMCI. The net performed normally on the two datasets, just abnormally on the ADvsMCI. I didn’t tried other models. This is a good way to check and lock problems :+1:. I will try it. I will also check the accuracy for other data sets.

Thank your for these valuable suggestions. I will try them first and then share you the results

Hi @ptrblck

I’ve run all the three datasets again. Now the accuracy values of training, validation, and test sets before and after loading the best model keep the same for the three tasks, but for ADvsMCI, the accuracy values of training, validation, and test sets are always 1.0 at the first epoch, even if I use different models such as ResNet18, ResNet34, and Inceptionv3.

Do you have any idea about the abnormal learning process? Thank you very much.

Do you think your data is just so easy to be perfectly classified? You mentioned ADvsMCI should be the hardest dataset.
Is it possible for you to upload this dataset? I could check the splitting for consistency. I have access to a machine with 256GB ram if that’s large enough.

Yes, ADvsMCI is the hardest set. Its accuracy should be less than the accuracy of the ADvsNC and MCIvsNC.

I do not have the authorization to upload the dataset on this website. The size is almost 3GB. Removing the test set, it is still almost 2G. How about send you the data separately through email?

Sure, just host it somewhere and I can have a look.