I am implementing a A2C algorithm for RL. For this purpose, I need to train the actor, which has a Gaussian distribution as a policy. Therefore, the mean and variance are two of my outputs and are continuous numbers. They need to be restricted to both being > 0 and the mean < than another adaptable threshhold.
My third output should have just three values, which is encoded into the decision making of the algorithm. Here, I am not sure if I can keep it in the same neural net or create a seperate one.
Anyway, the main question is, how I could restrict the above mentioned outputs, to fullfilling my conditions.