I do have list of history of states.

Then by numpy i calculate **discounted_rewards**

Then i multiply output of model with discounted_rewards using **torch.mm**

then

```
print(self.global_model.state_dict())
print("total_loss",total_loss)
total_loss.backward()
self.opt.step()
print(self.global_model.state_dict())
```

it’s output is

(‘dense1.weight’, tensor([[ 0.3997, -0.1907, 0.1120, 0.3016],

[ 0.1156, 0.0646, 0.1802, 0.3558],

[ 0.0321, 0.2537, 0.0879, 0.2441],

[-0.2952, -0.0886, -0.3235, 0.3006]])), (‘dense1.bias’, tensor([ 0.1927, 0.3048, -0.3551, -0.0302])), ('dense2.weigtotal_loss.backward() tensor(2.5806, dtype=torch.float64, grad_fn=)

(‘dense1.weight’, tensor([[ 0.3997, -0.1907, 0.1120, 0.3016],

[ 0.1156, 0.0646, 0.1802, 0.3558],

[ 0.0321, 0.2537, 0.0879, 0.2441],

[-0.2952, -0.0886, -0.3235, 0.3006]])), (‘dense1.bias’, tensor([ 0.192

and

```
self.opt = torch.optim.SGD(self.global_model.parameters(),lr = 0.01)
```

So it is not updating the weights. what i am missing ?

Model is

```
self.dense1 = torch.nn.Linear(4,4)
self.dense2 = torch.nn.Linear(4,4)
self.dense3 = torch.nn.Linear(4,4)
self.dense4 = torch.nn.Linear(4,4)
self.probs = torch.nn.Linear(4,2)
self.values = torch.nn.Linear(4, 1)
```