How to optimize the weights and bias in your neural network in order to get as results only positive values?

KFrank · April 28, 2023, 2:34pm

Hi Benja!

I assume you mean that you want the outputs of your network to be positive,
rather than the weight and bias parameters of your network to be positive.
(If you want your parameters to be positive, similar comments will apply, but
the details will be different.)

One simple approach would be to add a penalty to your loss function (where
pred is the output of your network):

loss = loss_fn (pred, target)
penalty = (torch.nn.functional.relu (-pred)**2).sum()   # for example
loss_with_penalty = loss + alpha * penalty

Note that penalty will not force the elements of pred to be positive, but
it will encourage them to be positive. However, by increasing the value
of the penalty-weight, alpha, you can push pred harder and harder not
to be negative.

To perform an “official” constrained optimization that requires pred to be
non-negative (an inequality constraint), you can add slack variables for the
elements of pred:

pred_with_slack = pred - slack**2

and constrain (with an equality constraint) pred_with_slack to be zero
element by element (where slack has the same shape as pred). Because
slack**2 can be positive, pred is free to become positive, but because
slack**2 can never be negative, the constraint on pred_with_slack
prevents pred from becoming negative.

You can use Lagrange multipliers to perform such an optimization where
pred_with_slack is constrained to be zero. However, because the optimum
for the Lagrange-multiplier optimization occurs at a saddle point (rather than
at a minimum), you can’t use gradient descent to perform the optimization
without tweaking it so that you use, in effect, gradient ascent on the Lagrange
multiplier.

This is explained by @tom, here, and a sound approach to implementing
such mixed gradient descent / ascent optimization using pytorch (with its
gradient-descent-based optimizers) is give by @t_naumenko, here.

Note that using the Lagrange-multiplier technique will not force pred to be
non-negative during the training process; only after the optimization has
converged to its (saddle-point) optimum will pred be non-negative.

(You can think of the Lagrange multiplier as being being like the alpha
penalty-weight in the loss_with_penalty optimization approach except
that the optimization process tunes the penalty-weight automatically.)

As an aside, this naturally raises the question why not? Depending on your use
case, it may make perfect sense to pass the output of your network through
something like relu() or, perhaps better, exp() to ensure positive values.

Good luck!

K. Frank