Hi!. I am using a neural network with the following layers: (784,784),(784,784),(784,2). I want the values in last layer to be positives, but I do not want to use any activation function such as ReLU that would solve the problem. What I want to know if there is a way to guide the weights and bias (in pytorch), in that last layer, in order to get only positive values while using nn.Linear()?. Any advice would be grateful. Thanks in advance!!

Hi Benja!

I assume you mean that you want the outputs of your network to be positive,

rather than the weight and bias parameters of your network to be positive.

(If you want your parameters to be positive, similar comments will apply, but

the details will be different.)

One simple approach would be to add a penalty to your loss function (where

`pred`

is the output of your network):

```
loss = loss_fn (pred, target)
penalty = (torch.nn.functional.relu (-pred)**2).sum() # for example
loss_with_penalty = loss + alpha * penalty
```

Note that `penalty`

will not force the elements of `pred`

to be positive, but

it will encourage them to be positive. However, by increasing the value

of the penalty-weight, `alpha`

, you can push `pred`

harder and harder not

to be negative.

To perform an “official” constrained optimization that requires `pred`

to be

non-negative (an *inequality constraint*), you can add slack variables for the

elements of `pred`

:

```
pred_with_slack = pred - slack**2
```

and constrain (with an *equality constraint*) `pred_with_slack`

to be zero

element by element (where `slack`

has the same shape as `pred`

). Because

`slack**2`

can be positive, `pred`

is free to become positive, but because

`slack**2`

can never be negative, the constraint on `pred_with_slack`

prevents `pred`

from becoming negative.

You can use Lagrange multipliers to perform such an optimization where

`pred_with_slack`

is constrained to be zero. However, because the optimum

for the Lagrange-multiplier optimization occurs at a saddle point (rather than

at a minimum), you can’t use gradient descent to perform the optimization

without tweaking it so that you use, in effect, gradient *ascent* on the Lagrange

multiplier.

This is explained by @tom, here, and a sound approach to implementing

such mixed gradient descent / ascent optimization using pytorch (with its

gradient-descent-based optimizers) is give by @t_naumenko, here.

Note that using the Lagrange-multiplier technique will not force `pred`

to be

non-negative *during* the training process; only after the optimization has

converged to its (saddle-point) optimum will `pred`

be non-negative.

(You can think of the Lagrange multiplier as being being like the `alpha`

penalty-weight in the `loss_with_penalty`

optimization approach except

that the optimization process tunes the penalty-weight automatically.)

As an aside, this naturally raises the question why not? Depending on your use

case, it may make perfect sense to pass the output of your network through

something like `relu()`

or, perhaps better, `exp()`

to ensure positive values.

Good luck!

K. Frank

Okey, thank you!! I think it works