This is my first time posting. I wrote a super basic feed forward network in an attempt to optimize a function. However, I noticed that it was not learning. So I decided to simplify the function as much as possible, while using the same essential architecture, and it’s still not learning.
Here is the network class:
class NeuralNet(nn.Module): def __init__(self, alpha, in_dims, fc1_dims, fc2_dims,out_dims): super(NeuralNet, self).__init__() self.in_dims = in_dims self.fc1_dims = fc1_dims self.fc2_dims = fc2_dims self.out_dims = out_dims self.fc1 = nn.Linear(self.in_dims, self.fc1_dims) self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims) self.fc3 = nn.Linear(self.fc2_dims, self.out_dims) self.optimizer = optim.Adam(self.parameters(), lr=alpha) # self.optimizer = optim.SGD(self.parameters(),lr=alpha,momentum=0.8) self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu') self.to(self.device) def forward(self, state): x = T.relu(self.fc1(state)) x = T.relu(self.fc2(x)) x = self.fc3(x) return x def choose_action(self, obs): obs = T.tensor([obs],dtype=T.float,requires_grad=True).to(self.device) action = T.relu(self.forward(obs)) return action.tolist() def learn(self, loss): self.optimizer.zero_grad() loss = T.tensor([loss], dtype=T.float,requires_grad=True).to(self.device) loss.backward() self.optimizer.step() def loss_func(self,target,arrow): target = np.sum(target) arrow = np.sum(arrow)**2 diff = (arrow-target)**2 loss = diff return loss
and here is the driver code:
if __name__ == '__main__': n_inputs = 1 n_outputs = 1 layer1 = 32 layer2 = 32 agent = NeuralNet(alpha=1e-5,in_dims=n_inputs, fc1_dims=layer1,fc2_dims=layer2, out_dims=n_outputs) n_games = int(100) done = False n_epochs = 50 learn_iters = 0 n_steps = 0 score_history =  avg_score = 0 best_score = 0 avg_score = 0 best_score = 0 for i in range(n_games): score = 0 done = False loss = 0 learn_iters = 0 while not done: input_val = np.random.randint(0,500,size=1)*.1 act = agent.choose_action(input_val) loss = agent.loss_func(input_val,act) score += loss agent.learn(loss) learn_iters+=1 if learn_iters > 1e4: done = True print(i) score_history.append(score) avg_score = np.mean(score_history[-100:]) if score > best_score: best_score = score print('episode', i, 'score %.1f' % score, 'avg score %.1f' % avg_score, 'best score %.1f' % best_score) x = [i+1 for i in range(len(score_history))] print('learned:',learn_iters,' times') plt.figure() plt.plot(x,score_history) plt.show()
As you can see, the NN is attempting to approximate y = x^2. “input_val” is a random number from 0 to 50, with 1 decimal point. The network outputs
act = agent.choose_action(input_val)
then the loss is calculated as the squared difference between input_val^2 and act. I am attempting to minimize this difference.
However, no matter how many training steps, no matter the learning rate, no matter the layer sizes, optimizer selection, activation functions, no matter what I do, I can’t get the average score to go down. The model is just not learning.
Here is a graph of the score history for 100 ‘games’
For this particular run, I called agent.learn() a total of 10001 times! and there’s barely any movement. Heck the score went up!
I’m convinced there’s something basic I’m missing, but I’ve scoured a ton of torch-based RL implementations and I can’t seem to find why my particular network isn’t training for such a basic function as y=x^2.
Any help is appreciated. Thank you!