Edit: Sorry, I’m a bit confused about all concepts and the code.

Could you guys help me figure out what I am doing wrong?

Note the entire code is here: https://gist.github.com/Willtl/4250d6391d40397b7d7d335510190802

I’m working with reinforcement learning, so at each step t, I am feeding the current state (input), given the output, I pick an action and assume the new state, then I feed the new state again to get maximal value from this new state (this is to calculate the target).

```
input: tensor([[2., 0., 1., 1.]])
output: tensor([[-0.7809, 0.4925, 0.1809, -0.4934]], grad_fn=<AddmmBackward>)
target tensor([[-0.7809, 0.4357, 0.1809, -0.4934]], grad_fn=<CopySlices>)
q_target tensor(0.4357, grad_fn=<AddBackward0>)
loss tensor(0.0008, grad_fn=<MeanBackward0>)
```

After **ANN.backward()** (see below in the ANN class), If I print the output obtained with the same “example” that I just used to train the model, the values are exactly is the same. See below

```
input: tensor([[2., 0., 1., 1.]])
output: tensor([[-0.7809, 0.4925, 0.1809, -0.4934]], grad_fn=<AddmmBackward>)
```

Q1: Is it ok in this situation to use MSELoss?

Q2: Why the results are not changing? Is it ok to feed multiple times (like I am doing, the second feed is not supposed to be considered a training step, it is just to get the results given the new state)?

Q3: MSE should not be used here, right? It would lead to a single value that would be propagated for all the outputs? Would not be correct to have a tensor like `loss = target - output`

? In this way we know the error with respect to each output?

```
class ANN(nn.Module):
# ANN's layer architecture
def __init__(self):
# Initialize superclass
super().__init__()
# Fully connected layers
self.inputs = 4
self.outputs = 4
self.l1 = nn.Linear(self.inputs, 4) # To disable bias use bias=False
self.l2 = nn.Linear(4, 4)
self.l3 = nn.Linear(4, 4)
self.l4 = nn.Linear(4, self.outputs)
self.optimizer = optim.Adam(self.parameters(), lr=learning_rate)
self.loss_criterion = nn.MSELoss()
# Define how the data passes through the layers
def foward(self, x):
# Passes x through layer one and activate with rectified linear unit function
x = F.relu(self.l1(x))
x = F.relu(self.l2(x))
x = F.relu(self.l3(x))
# Linear output layer
x = self.l4(x)
return x
def feed(self, x):
outputs = self.foward(x)
return outputs
# Train the network with one state
def backward(self, output, target):
# Zero gradients
self.optimizer.zero_grad()
# Calculate loss
loss = self.loss_criterion(output, target)
# Perform a backward pass, and update the weights.
loss.backward()
self.optimizer.step()
return loss
```