Save the best model

Thanks for your reply.
When I train the model and on the same notebook I save the model and reload it everything works fine.
But when I load the saved model from a new notebook or script, the model’s output is not correct. Actually, it produces a different prediction in every run with the same input! It seems the model is in the training phase, not the evaluation phase. I’ve packed all of my work into this repo and the training+prediction notebook shows the way I use the prediction.

I assume you’ve called model.eval() and this behavior was not observed in the training notebook (also after calling model.eval())?
If so, then something seems to break inside the model so could you post a minimal, executable code snippet to reproduce the issue?

Assume the following scenario:
I have trained the model with 100 epochs and I’ve saved the best model under the name of bestmodel.pth.
Then, I use the following script for prediction, name it predict.py:

#!/usr/bin/env python3

import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

from data import Data
from config import CFG
from char2vec import BiLSTMtagger

import argparse
from rich import print
import warnings

warnings.filterwarnings("ignore")

EMBEDDING_DIM = CFG.out_ch2
HIDDEN_DIM = 128
TAGSET_SIZE = Data.label_vocab_size  # en, es, other


def predict(args):
    model = BiLSTMtagger(EMBEDDING_DIM, HIDDEN_DIM, TAGSET_SIZE)
    state = torch.load(args.model, map_location=torch.device(device))
    model.load_state_dict(state, strict=True)
    model.to(device)
    model.eval()
    tokens = args.text.split()
    x = Data.embedding_s(Data.chr2id, [tokens + ['.']])
    out = model(torch.LongTensor(x).to(device)).argmax(dim=-1)[0].tolist()
    labels = [Data.id2lbl[i] for i in out]

    return labels[:-1]


if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        "Testing a pretrained Character Based CNN+BiLSTM for Code-Switching")
    parser.add_argument("--model",
                        type=str,
                        default="../saved-models/bestmodel.pth",
                        help="path for pre-trained model")
    parser.add_argument("--text",
                        type=str,
                        default="@lililium This is an audio book !",
                        help="text string")

    args = parser.parse_args()
    labels = predict(args)

    print(f'input : {args.text}')
    print(f'prediction : {" ".join(labels)}')

When I launch the predict.py in the terminal with every run it produces different output! e.g.:

$ ./predict.py
input : @lililium This is an audio book !
prediction : es en other other other other other
$ ./predict.py
input : @lililium This is an audio book !
prediction : es other other other other en other
$ ./predict.py
input : @lililium This is an audio book !
prediction : es en en other other en other
$ ./predict.py
input : @lililium This is an audio book !
prediction : es en en other en other other
$ ./predict.py
input : @lililium This is an audio book !
prediction : es en other en en en other
$ ./predict.py
input : @lililium This is an audio book !
prediction : es en en other en other other

While the trained model in the notebook gives me this result:

['other', 'en', 'en', 'en', 'es', 'en', 'other']

As you can see, it seems the model is in the training phase! as it shows a totally different result for each run! :confused:

The output weights are wholly different in each run:

$ ./predict.py
tensor([[[-1.4239e+01, -7.7564e-03, -1.0170e-05, -3.7519e+00],
         [-1.9065e+01, -7.0783e+00, -1.1999e+01, -9.5747e+00],
         [-1.8749e+01, -7.6804e+00, -1.4426e+01, -1.0382e+01],
         [-1.9527e+01, -6.4360e+00, -1.5228e+01, -1.2201e+01],
         [-1.9501e+01, -6.1012e+00, -1.4942e+01, -1.1762e+01],
         [-1.5594e+01, -5.9746e+00, -1.2779e+01, -6.1353e+00],
         [-7.9881e+00, -1.0951e+01, -1.7105e+01, -2.5605e+00],
         [-3.4039e-04, -1.0904e+01, -1.6740e+01, -1.0872e-01]]])
input : @lililium This is an audio book !
prediction : es en en en en en other

$ ./predict.py
tensor([[[-1.1975e+01, -8.6052e-01, -3.9600e+00, -2.8501e+00],
         [-1.5309e+01, -5.4100e+00, -1.0085e+01, -6.5682e+00],
         [-1.0863e+01, -7.2653e+00, -1.1599e+01, -4.9432e+00],
         [-1.5746e-03, -1.1551e+01, -1.2867e+01, -1.2314e-01],
         [-6.9447e+00, -3.0739e+00, -7.4750e-01, -4.0715e+00],
         [-1.1933e+01, -1.2232e+00, -1.2816e+00, -5.9172e+00],
         [-7.4662e+00, -5.9037e+00, -2.5671e+00, -3.5746e+00],
         [-1.2141e+01, -1.4757e+00, -1.8773e+00, -6.3705e+00]]])
input : @lililium This is an audio book !
prediction : en en other <PAD> es en es

Let’s try to focus on the first training script and the general model behavior as we might be mixing different issues now. You’ve mentioned that the difference is visible between the training and the prediction script. Are you seeing the same behavior in the training script as well? I.e. is the model returning different output after calling model.eval() in the training script (after it was trained and before it was saved) when the same inputs are used?
If so, then let’s skip the prediction script as it seems that the model is using some layers, which are responsible for the change in the predictions. In that case the model architecture would be needed to try to isolate the issue.
On the other hand, if the model behaves as expected in the training script, the issue seems to be in the model loading and execution in the predict.py and we would need to narrow down what the exact difference between these scripts is.

1 Like

Really appreciate your time and consideration.

As I mentioned earlier, I have no issue with training. I mean in the training notebook, I load the bestmodel in the same way as predict.py in the last cell and the result of prediction is fine and it always produces the same result with the same output weights for specific input.

Dear @ptrblck,

Thanks again for your time. I found the issue with the code. I’ve used set to remove redundant tokens and then made a dict from that data, but I didn’t save it with my model. Every time the code runs, it makes a new arrangement based on the current state of the machine, and now you guess what happened … :sweat_smile: :face_holding_back_tears:

Good to hear you’ve narrowed it down! :slight_smile:

1 Like