Constraining Neural Network output within an arbitrary range

Mari · July 22, 2021, 7:34pm

Hi,
Any suggestion about how I can enforce the output of this neural network be within an arbitrary range e.g. (0,0.5)?

class Net2_kc(nn.Module):
def init(self):
super(Net2_kc, self).init()
self.main = nn.Sequential(
nn.Linear(input_n,h_nk),
Swish(),
nn.Linear(h_nk,h_nk),
Swish(),
nn.Linear(h_nk,h_nk),
Swish(),
.
nn.Linear(h_nk,1),
)

 def forward(self,x):
        output = self.main(x)
        return  output

eqy · July 22, 2021, 7:48pm

Constraining the range is relatively straightforward (although you might want to consider if you want all outputs in this range to be equally likely).
A simple way to do this is to add a sigmoid layer (which will constrain the range to be between (0, 1)) and then to scale that output so that it is between (0, 0.5).

Mari · July 22, 2021, 8:13pm

Thanks for the reply.
Excepting adding an activation function like Sigmoid on output what do you suggest? consider I want my output within range (0,2).

Can you please provide an example of scaling the output?

eqy · July 22, 2021, 8:15pm

 def forward(self,x):
        x = self.main(x)
        output = torch.sigmoid(x)*2
        return  output

phanindra_parashar · August 31, 2021, 1:02pm

This method will constrain it but also there is a small issue with the gradient getting smaller near the lower and upper bounds.

googlebot · August 31, 2021, 6:17pm

that’s kind of normal and at least allows stable optimization near the limits (original 0 and 1 are usually hard thresholds). if you want to avoid this, you can try .clamp() instead, that masks gradients when values are out of range IIRC - I have a concern that values may get stuck at limits with this though.

Mari · August 31, 2021, 8:51pm

I tried different activation functions like sigmoid and relu to force the output to be positive but these activation functions make the gradient very small and the output of kc would be very very small value close to zero. Only activation function that gives a none zero value is swish but the output is negative. Do you have any suggestions how I can force the output to be a positive and in range (0,0.5)?

googlebot · August 31, 2021, 9:38pm

Hardtanh(0,0.5) has constant gradient, but it is almost the same as clamp; you have to ensure that pre-activation inputs in the inner region exist through the training - perhaps batchnorm could help here.

Mari · August 31, 2021, 11:31pm

Thank you for the reply. I have tried batch normalization and clamp. The problem still exists.

googlebot · September 1, 2021, 2:15am

there are no catastrophic problems of small gradients with any of these approaches per se, but they fail if you push pre-activation values too far (due to big LR, gradient explosions/spikes etc.), usually early in the training. you can try to “bandaid fix” this issue with something like x.clamp(-5,5).sigmoid() * scale.

I also realized that batchnorm is only appropriate for internal layers, and there is normally no reasons to restrict ranges there.

Mari · September 1, 2021, 2:29am

Do you mean I need to apply it on output like this:
def forward(self,x):
x = self.main(x)
output = x.clamp(-5,5).torch.sigmoid()*scale
return output

Mari · September 1, 2021, 2:40am

There is an issue here. I tried different things to force the output to be positive and in range but for all of them I am getting a very small value close to zero for the output.(methods make the output positive and since it is close to zero it is in range ). It seems that there is something forcing the output toward negative values and with the used method it gets just zero.While I know that the correct value should be a value in range 0 and 0.5(output should linearly change with input).

googlebot · September 1, 2021, 6:32pm

restraints may work well for distribution parameters, esp. in models with random sampling, I’m not sure about using them on loss inputs - you kind of need balanced positive/negative errors, if your targets are often at bounds or with some other conditions, bounds will be strong attractors (affecting shared parameters in your output layer).

In other words, if you use say MSELoss, you postulate that your model should have gaussian errors of predictions, but bounded predictions conflict with that, esp. with narrow ranges.