Super basic Feedforward network does not learn


This is my first time posting. I wrote a super basic feed forward network in an attempt to optimize a function. However, I noticed that it was not learning. So I decided to simplify the function as much as possible, while using the same essential architecture, and it’s still not learning.

Here is the network class:

class NeuralNet(nn.Module):
    def __init__(self, alpha, in_dims,
                 fc1_dims, fc2_dims,out_dims):
        super(NeuralNet, self).__init__()
        self.in_dims = in_dims
        self.fc1_dims = fc1_dims
        self.fc2_dims = fc2_dims
        self.out_dims = out_dims
        self.fc1 = nn.Linear(self.in_dims, self.fc1_dims)
        self.fc2 = nn.Linear(self.fc1_dims, self.fc2_dims)
        self.fc3 = nn.Linear(self.fc2_dims, self.out_dims)

        self.optimizer = optim.Adam(self.parameters(), lr=alpha)
        # self.optimizer = optim.SGD(self.parameters(),lr=alpha,momentum=0.8)
        self.device = T.device('cuda:0' if T.cuda.is_available() else 'cpu')

    def forward(self, state):
        x = T.relu(self.fc1(state))
        x = T.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    def choose_action(self, obs):
        obs = T.tensor([obs],dtype=T.float,requires_grad=True).to(self.device)
        action = T.relu(self.forward(obs))
        return action.tolist()

    def learn(self, loss):
        loss = T.tensor([loss], dtype=T.float,requires_grad=True).to(self.device)

    def loss_func(self,target,arrow):
        target = np.sum(target)
        arrow = np.sum(arrow)**2
        diff = (arrow-target)**2
        loss = diff
        return loss

and here is the driver code:

if __name__ == '__main__':
    n_inputs = 1
    n_outputs = 1
    layer1 = 32
    layer2 = 32
    agent = NeuralNet(alpha=1e-5,in_dims=n_inputs,
    n_games = int(100)
    done = False
    n_epochs = 50

    learn_iters = 0
    n_steps = 0

    score_history = []
    avg_score = 0
    best_score = 0

    avg_score = 0
    best_score = 0

    for i in range(n_games):
        score = 0
        done = False
        loss = 0
        learn_iters = 0
        while not done:
            input_val = np.random.randint(0,500,size=1)*.1
            act = agent.choose_action(input_val)
            loss = agent.loss_func(input_val,act)
            score += loss
            if learn_iters > 1e4:
                done = True

        avg_score = np.mean(score_history[-100:])
        if score > best_score:
            best_score = score
        print('episode', i, 'score %.1f' % score, 
                'avg score %.1f' % avg_score,
                'best score %.1f' % best_score)
    x = [i+1 for i in range(len(score_history))]
    print('learned:',learn_iters,' times')


As you can see, the NN is attempting to approximate y = x^2. “input_val” is a random number from 0 to 50, with 1 decimal point. The network outputs

act = agent.choose_action(input_val) 

then the loss is calculated as the squared difference between input_val^2 and act. I am attempting to minimize this difference.

However, no matter how many training steps, no matter the learning rate, no matter the layer sizes, optimizer selection, activation functions, no matter what I do, I can’t get the average score to go down. The model is just not learning.

Here is a graph of the score history for 100 ‘games’


For this particular run, I called agent.learn() a total of 10001 times! and there’s barely any movement. Heck the score went up!

I’m convinced there’s something basic I’m missing, but I’ve scoured a ton of torch-based RL implementations and I can’t seem to find why my particular network isn’t training for such a basic function as y=x^2.

Any help is appreciated. Thank you!

You are detaching the loss tensor by r-wrapping it into a new tensor in:

loss = T.tensor([loss], dtype=T.float,requires_grad=True).to(self.device)

Just call loss.backward() and remove the previous line of code and it should work.

EDIT: I just realized that you are detaching the computation graph at multiple places.
Generally, you would need to keep all operations in PyTorch without using numpy etc. since Autograd won’t be able to track operations from 3rd party libraries.
In particular, keep action as a tensor and don’t convert it to a list. Also, use torch.sum and other PyTorch operation during the loss calculation and remove the numpy operations.

1 Like

Thanks so much, that actually makes a lot of sense.

This implementation is a super simplified version of a more complex implementation which is solving over an environment class; similar to an openai gym env. A couple of examples from the actual use case:

The NN outputs n_outputs; those values are input to a env.interpret_action() function that parses the values and then uses those values to compute a reward; similar to how a gym environment might take in a discrete action (in which case choose_action would return action.item() ). But I have multiple actions I need to parse.

EDIT: for a concrete example: the output layer outputs say 10 values, I want to do torch.sin(output[0:4]) and torch.sigmoid(output[4:]). Will that detach as well? How would I avoid that problem?

Is there anyway I can do those operations without unintentionally detaching? Can I detach, do the math I need to do, and then re attach without affecting the graph?

How do gym environments handle the passing of states and actions back and forth without detaching from the graph?

No, you won’t be able to re-attach an already detached tensor to a computation graph.
You could check this tutorial for a RL example or take a look at torchrl which is most likely also showing a few example use cases which might be similar to yours (I haven’t played around with torchrl yet).

1 Like

Update on my post in case anyone else runs into this issue:

The posted solution above by @ptrblck did the trick. All operations need to be performed in tensor form.

I’m not sure how openai gym manages passing data back and forth, but for the NN implementation above, defining input_val as a tensor with requires_grad=True and subsequently using only tensor operations throughout the program flow fixed the learning issue. I will post updated code and results in an edit shortly.

Thanks for the help!

1 Like