Edit: Sorry, I’m a bit confused about all concepts and the code.
Could you guys help me figure out what I am doing wrong?
Note the entire code is here: https://gist.github.com/Willtl/4250d6391d40397b7d7d335510190802
I’m working with reinforcement learning, so at each step t, I am feeding the current state (input), given the output, I pick an action and assume the new state, then I feed the new state again to get maximal value from this new state (this is to calculate the target).
input: tensor([[2., 0., 1., 1.]]) output: tensor([[-0.7809, 0.4925, 0.1809, -0.4934]], grad_fn=<AddmmBackward>) target tensor([[-0.7809, 0.4357, 0.1809, -0.4934]], grad_fn=<CopySlices>) q_target tensor(0.4357, grad_fn=<AddBackward0>) loss tensor(0.0008, grad_fn=<MeanBackward0>)
After ANN.backward() (see below in the ANN class), If I print the output obtained with the same “example” that I just used to train the model, the values are exactly is the same. See below
input: tensor([[2., 0., 1., 1.]]) output: tensor([[-0.7809, 0.4925, 0.1809, -0.4934]], grad_fn=<AddmmBackward>)
Q1: Is it ok in this situation to use MSELoss?
Q2: Why the results are not changing? Is it ok to feed multiple times (like I am doing, the second feed is not supposed to be considered a training step, it is just to get the results given the new state)?
Q3: MSE should not be used here, right? It would lead to a single value that would be propagated for all the outputs? Would not be correct to have a tensor like
loss = target - output? In this way we know the error with respect to each output?
class ANN(nn.Module): # ANN's layer architecture def __init__(self): # Initialize superclass super().__init__() # Fully connected layers self.inputs = 4 self.outputs = 4 self.l1 = nn.Linear(self.inputs, 4) # To disable bias use bias=False self.l2 = nn.Linear(4, 4) self.l3 = nn.Linear(4, 4) self.l4 = nn.Linear(4, self.outputs) self.optimizer = optim.Adam(self.parameters(), lr=learning_rate) self.loss_criterion = nn.MSELoss() # Define how the data passes through the layers def foward(self, x): # Passes x through layer one and activate with rectified linear unit function x = F.relu(self.l1(x)) x = F.relu(self.l2(x)) x = F.relu(self.l3(x)) # Linear output layer x = self.l4(x) return x def feed(self, x): outputs = self.foward(x) return outputs # Train the network with one state def backward(self, output, target): # Zero gradients self.optimizer.zero_grad() # Calculate loss loss = self.loss_criterion(output, target) # Perform a backward pass, and update the weights. loss.backward() self.optimizer.step() return loss