How to optimize the weights and bias in your neural network in order to get as results only positive values?

Hi!. I am using a neural network with the following layers: (784,784),(784,784),(784,2). I want the values in last layer to be positives, but I do not want to use any activation function such as ReLU that would solve the problem. What I want to know if there is a way to guide the weights and bias (in pytorch), in that last layer, in order to get only positive values while using nn.Linear()?. Any advice would be grateful. Thanks in advance!!

Hi Benja!

I assume you mean that you want the outputs of your network to be positive,
rather than the weight and bias parameters of your network to be positive.
(If you want your parameters to be positive, similar comments will apply, but
the details will be different.)

One simple approach would be to add a penalty to your loss function (where
`pred` is the output of your network):

``````loss = loss_fn (pred, target)
penalty = (torch.nn.functional.relu (-pred)**2).sum()   # for example
loss_with_penalty = loss + alpha * penalty
``````

Note that `penalty` will not force the elements of `pred` to be positive, but
it will encourage them to be positive. However, by increasing the value
of the penalty-weight, `alpha`, you can push `pred` harder and harder not
to be negative.

To perform an “official” constrained optimization that requires `pred` to be
non-negative (an inequality constraint), you can add slack variables for the
elements of `pred`:

``````pred_with_slack = pred - slack**2
``````

and constrain (with an equality constraint) `pred_with_slack` to be zero
element by element (where `slack` has the same shape as `pred`). Because
`slack**2` can be positive, `pred` is free to become positive, but because
`slack**2` can never be negative, the constraint on `pred_with_slack`
prevents `pred` from becoming negative.

You can use Lagrange multipliers to perform such an optimization where
`pred_with_slack` is constrained to be zero. However, because the optimum
for the Lagrange-multiplier optimization occurs at a saddle point (rather than
at a minimum), you can’t use gradient descent to perform the optimization
without tweaking it so that you use, in effect, gradient ascent on the Lagrange
multiplier.

This is explained by @tom, here, and a sound approach to implementing
such mixed gradient descent / ascent optimization using pytorch (with its
gradient-descent-based optimizers) is give by @t_naumenko, here.

Note that using the Lagrange-multiplier technique will not force `pred` to be
non-negative during the training process; only after the optimization has
converged to its (saddle-point) optimum will `pred` be non-negative.

(You can think of the Lagrange multiplier as being being like the `alpha`
penalty-weight in the `loss_with_penalty` optimization approach except
that the optimization process tunes the penalty-weight automatically.)

As an aside, this naturally raises the question why not? Depending on your use
case, it may make perfect sense to pass the output of your network through
something like `relu()` or, perhaps better, `exp()` to ensure positive values.

Good luck!

K. Frank

Okey, thank you!! I think it works