Problem loading weights

I am trying to load a trained network, but now doesnt work.
This is the message I get: " IncompatibleKeys(missing_keys=[], unexpected_keys=[]) "
I ve printed model keys, model trained that i try to load keys, and are the same…Help please!

It looks alright, since no missing or unexpected keys were found. :slight_smile:

3 Likes

Yes, it’s ok :). But until yesterday, I didn’t received that message.

Hello ,I’m trying to load a trained model’s weights, but when i load it my AI restart learning from the beginning, like it didn’t take in consideration the weights.
My code for saving weights :
torch.save({‘net’:self.net.state_dict(),
‘optim’:self.optimizer.state_dict(),
},ai.bak)
And for loading :
checkpoint=torch.load(self.find_file(‘ai.bak’))
self.net.load_state_dict(checkpoint[‘net’])
Help me please!

Did you also load the optimizer’s state_dict?
Could you post some loss values during training and after restoring?
Also, which optimizer are you using?

1 Like

Yes, i also load optimize’s state.
i’m using RMSprop optimizer.
For example after 700 steps, the loss values are
tensor(1.1155, grad_fn=)
But after restoring I get
tensor(0.3977, grad_fn=)

So the loss is actually lower after restoring the training?

1 Like

Sorry it takes the last value after training but :
After restoring , the value of loss is :

Tensor(0.4,grad_fn=)

After 800 steps , the value of loss is :

Tensor(1.1,grad_fn=)

After 1000 steps :

Tensor(0.6,grad_fn=)

It’s start increasing for while before decreasing, when i restore the weights, it’s weird .

That’s strange. Do you have a (small) executable code snippet so that we can have a look at this issue?
Are you changing anything else after restoring, e.g. your data, model.eval()/train() etc.?

1 Like

class Ai(Player):
“”"
This class implements an AI based on the perceptron defined in class Net
“”"
def init(self):
super(Ai, self).init()

	self.net = Net()
	#Load network weights if they have been initialized already
	exists = os.path.isfile(self.find_file('ai.pth'))
	if exists:
		print('waa')
		checkpoint=torch.load(self.find_file('ai.pth'))
		self.net.load_state_dict(checkpoint['net'])
		
		self.net.eval()
		

	self.net1=Net()
	self.net1.load_state_dict(self.net.state_dict())
	self.net1.eval()

This code is unfortunately not executable.
Anyway, it looks like you are setting the model(s) to .eval() after loading.
Is it also the case for the continuation of your training?
If so, you should set them to .train() mode.

I will try with .train(), and see the result thank you.

The difference in loss with the model in eval() vs train() mode might be happening if you have any batchnorm in your dataset. In the inference stage, the behaviour is per-image for batch norm as far as I know.