I would like to use a 1-Lipschitz continuous function as an activation function in my network, such as ReLU and LeakyReLU (with negative slop between 0 and 1).
I also would like to use an activation function similar to PReLU with a learnable parameter while having the 1-Lipschitz continuous property. Are there any ways that I can add a constraint to the torch.nn.PReLU such that the learnable parameter is bound between 0 and 1?
wt_bnd, the weight parameter passed to the functional version of prelu(),
is bounded between 0 and 1, while the actual trainable parameter, wt, is
unbounded, so you can train it without needing to add constraints somehow
to the optimizer.
Thanks for your suggestions. To me, it is a normalization of the learnable parameter, but can we say now our activation function is 1-Lipschitz continuous?
If you’re asking whether our “constrained prelu” is Lipschitz continuous
when viewed as a function of just its input (with its weight parameter
held fixed), the answer is yes. By inspection prelu (input) is Lipschitz
continuous.
If your asking whether prelu (input, weight) is Lipschitz continuous
when understood as a function of two variables, even though the weight
we pass in is bounded, the answer is no. For input < 0, prelu (input, weight) = weight * input, so the partial derivative, d prelu (input, weight) / d weight = input, is unbounded
because input can run off to inf and -inf.