I do have list of history of states.
Then by numpy i calculate discounted_rewards
Then i multiply output of model with discounted_rewards using torch.mm
then
print(self.global_model.state_dict())
print("total_loss",total_loss)
total_loss.backward()
self.opt.step()
print(self.global_model.state_dict())
it’s output is
(‘dense1.weight’, tensor([[ 0.3997, -0.1907, 0.1120, 0.3016],
[ 0.1156, 0.0646, 0.1802, 0.3558],
[ 0.0321, 0.2537, 0.0879, 0.2441],
[-0.2952, -0.0886, -0.3235, 0.3006]])), (‘dense1.bias’, tensor([ 0.1927, 0.3048, -0.3551, -0.0302])), ('dense2.weigtotal_loss.backward() tensor(2.5806, dtype=torch.float64, grad_fn=)
(‘dense1.weight’, tensor([[ 0.3997, -0.1907, 0.1120, 0.3016],
[ 0.1156, 0.0646, 0.1802, 0.3558],
[ 0.0321, 0.2537, 0.0879, 0.2441],
[-0.2952, -0.0886, -0.3235, 0.3006]])), (‘dense1.bias’, tensor([ 0.192
and
self.opt = torch.optim.SGD(self.global_model.parameters(),lr = 0.01)
So it is not updating the weights. what i am missing ?
Model is
self.dense1 = torch.nn.Linear(4,4)
self.dense2 = torch.nn.Linear(4,4)
self.dense3 = torch.nn.Linear(4,4)
self.dense4 = torch.nn.Linear(4,4)
self.probs = torch.nn.Linear(4,2)
self.values = torch.nn.Linear(4, 1)