I am building a version of AlphaZero an am having the problem that my model never learns. model.training = True. The Loss is calculated using the NN outputs and it isn’t detached.
The model weights never change their values when it’s being trained.
I’m not sure what the error is.
def train_model(self):
'''
We calculate and store the loss between the MCTS and NN values/policies.
Also, the model is trained according to the loss calculated.
'''
value_loss = torch.mean((self.values["MCTS"] - self.values["NN"])**2)
policy_loss = - sum([torch.dot(self.policies["MCTS"][i,:], torch.log(self.policies["NN"][i,:])) for i in range(self.episodes)]) / self.episodes
total_loss = value_loss + policy_loss
self.optimizer.zero_grad()
total_loss.backward()
self.optimizer.step()
Thanks for spotting this. I’ve changed the model (edited in post) however my model still isn’t learning. Using the loop below, the model is trained:
conv_layer = list(self.model.parameters())
for _ in range(5):
self.simulation()
self.mcts_values()
train_model(self) # The model is trained in this fn
print(list(self.model.parameters()) == conv_layer)
print("")
Which just prints “True” 5 times. I’m assuming that this means that the model weights aren’t changing? The learning rate I’ve set is 0.01 with an adam optimizer:
No, since you are storing the references to the parameters instead of the actual value.
To compare parameters you would need to clone them as seen in this example:
conv_layer = list(self.model.parameters())
w0 = self.model.weight.clone()
for _ in range(5):
self.simulation()
self.mcts_values()
train_model(self) # The model is trained in this fn
print(list(self.model.parameters()) == conv_layer)
print((self.model.weight - w0).abs().max())
print("")
I get back: “AttributeError: ‘ResNet’ object has no attribute ‘weight’”
My minimal code snippet uses a single nn.Conv2d layer: model= nn.Conv2d(3, 3, 3) so you would need to adapt the code and access a valid parameter via a registered module in your ResNet.
conv_layer = list(self.model.parameters())
w0 = self.model.conv1.weight.data[0][0].clone()
print(w0, end="\n\n")
for _ in range(5):
self.simulation()
self.mcts_values()
train_model(self) # The model is trained in this fn
print(list(self.model.parameters()) == conv_layer)
print((self.model.conv1.weight.data[0][0] - w0).abs().max())
print("")
print(self.model.conv1.weight.data[0][0])