I have training , validation and test dataset(NLP problem , So I used LSTM , GRU) . The model contains batch norm layer (I think this is the reason for discrepancy I am observing). I don’t have true labels for test dataset. This was my training procedure before :
- Train on training dataset for 5 epochs (model.train() was used). For each epoch after the training I make prediction on validation dataset to see my results(model.eval() was used) and then save the model if the validation score(AUC score as metric) increased. At last I load the best model and then make predictions on test data (model.eval() was used).
This is my pipeline:
for epoch_number in range(epochs):
Train the model on training dataset.
Make predictions on validation
save the model using torch.save(model.state_dict(),fname) if the validation score increased.
Load the best model and then predictions on test data.
For 5 epochs my train loss and validation loss are
1) train loss = 0.2599791 , val loss = 0.2254444
2) train loss = 0.2198705 , val loss = 0.2254712
3) train loss = 0.2080045 , val loss = 0.2124491
4) train loss = 0.1860864 , val loss = 0.18708
5) train loss = 0.1701995 , val loss = 0.1935813
Recently I changed the procedure a little bit. It is as follows :
- Train on training dataset for 5 epochs(model.train() was used), For each epoch after the training I make predictions on both validation and test dataset(model.eval() was used for both) and then save the validation and test predictions for future use. This is my pipeline
for epoch_number in range(epochs):
Train the model.
Make predictions on validation and test.
save the predictions.
This is where I am seeing some discrepancy. As it’s obvious that train loss and validation loss must remain same because I am making predictions at every epoch and this will not update the weights. But these are my train loss and validation loss
1) train loss = 0.2599791 , val loss = 0.2254444
2) train loss = 0.2196528 , val loss = 0.2283426
3) train loss = 0.2078255 , val loss = 0.1996013
4) train loss = 0.1848111 , val loss = 0.182577
5) train loss = 0.1680537 , val loss = 0.1896651
Note that the first epoch train loss and validation loss in both cases is same (This the sole reason I have provided these numbers) and then there are changes after 1st epoch.
Why this discrepancy is happening?
I have been thinking of this problem for some days and here are my thoughts on this
- When you train with batch norm it keeps running estimates and then use these running estimates in model.eval() mode. But I think in the eval mode the running estimates are also changed and this is the reason for discrepancy. May be I am wrong it’s just my view.