What's the point of an activation function for the output layer for a regression problem?

For instance, the DQN will perform the decision with the highest Q-value.
Is there a point in processing the output layer e.g. with a .softmax()?

Sigmoid maps the values between 0 and 1. This is typically done for probabilities and works well with certain loss functions. Perhaps the person who made the DQN network just did so out of habit. I’ve also seen dropout commonly used in DQNs but have never seen anyone prove this is of any benefit in those types of networks.

Check your reward function. If it’s giving values outside of the range of 0 to 1, you might do well to try removing the final sigmoid function.

1 Like