Saved model works bad

Hello Pytorchers,
I am having an issue with trained model. In training, model after time performed very good, but after it was saved and loaded (state_dicts), it could not even give 1 valid prediction.

Network structure:

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(64, 128)
        self.sig = nn.Sigmoid() # Tried also Tanh and ReLu
        self.fc2 = nn.Linear(128, 64)

Optimizer: RMSprop
Criterion: SmoothL1Loss

P.s. Model size is like 66kb, is it okay?
P.s. Tried different hidden neuron size ( 32, 64, 128 ). Tried different learning rates.

Thank you!

Are you using the same data to evaluate your model?
How was the training and validation error?
Did you make sure to call model.eval() before evaluating the model? Your current model probably doesn’t need it, because it doesn’t contain any Dropout or BatchNorm, but it’s recommended anyway.

Thanks for your answer, Ptrblck!
I am creating AI player and train it by playing the game.
Yes, first I train it with the best move possible. After that I use predict and check if it is correct. If not, repeat, else move and continue with game. In training, after like 15 generations ( 50 * 15 * ~7000 moves ) I reach first models with 100% accuracy.
Yes, i tried model.eval(), it does not change anything.

Could you post your save&load code please?

Hi SimonW,
There is nothing special with my save&load.

I save model in hardcoded times in data/time/modelname.pkl

# Save the Model
end = time.time()
for i in range(6):
if (end - self.start >= self.savedTimes[i] and not self.saved[i]):
    self.saved[i] = True
    os.makedirs(os.path.dirname('data/' + time.strftime("%Y%m%d-%H%M%S", time.localtime(self.start)) + str("/") + self.savedFileNames[i]), exist_ok=True), 'data/' + time.strftime("%Y%m%d-%H%M%S", time.localtime(self.start)) + str("/") + self.savedFileNames[i])


    def __init__(self, game, color, modelName, size): = game
        self.color = color
        if size == 32:
            self.model = Net32()
        elif size == 64:
            self.model = Net64()
        elif size == 128:
            self.model = Net128()


p.s. Net32, Net64, Net128 have the same content except hidden neuron number. Wrote that in hurry.

1 Like

You are right. This part looks fine… Could you try to come up with a minimal self-contained script that reproduce the issue? Thanks!

Did you train it on GPU or CPU? I faced some issues when trained the model on GPU and tried to load it on CPU

You’re saving torch.player., but restoring into state.model. There’s nothing to say that that is not correct, but it seems … unintuitive/inconsistent. Might be worth making these names consistent, just in case this is masking the actual bug somehow?