Any suggestion about how I can enforce the output of this neural network be within an arbitrary range e.g. (0,0.5)?
self.main = nn.Sequential(
output = self.main(x)
Constraining the range is relatively straightforward (although you might want to consider if you want all outputs in this range to be equally likely).
A simple way to do this is to add a sigmoid layer (which will constrain the range to be between (0, 1)) and then to scale that output so that it is between (0, 0.5).
Thanks for the reply.
Excepting adding an activation function like Sigmoid on output what do you suggest? consider I want my output within range (0,2).
Can you please provide an example of scaling the output?
x = self.main(x)
output = torch.sigmoid(x)*2
This method will constrain it but also there is a small issue with the gradient getting smaller near the lower and upper bounds.
that’s kind of normal and at least allows stable optimization near the limits (original 0 and 1 are usually hard thresholds). if you want to avoid this, you can try .clamp() instead, that masks gradients when values are out of range IIRC - I have a concern that values may get stuck at limits with this though.
I tried different activation functions like sigmoid and relu to force the output to be positive but these activation functions make the gradient very small and the output of kc would be very very small value close to zero. Only activation function that gives a none zero value is swish but the output is negative. Do you have any suggestions how I can force the output to be a positive and in range (0,0.5)?
Hardtanh(0,0.5) has constant gradient, but it is almost the same as clamp; you have to ensure that pre-activation inputs in the inner region exist through the training - perhaps batchnorm could help here.
Thank you for the reply. I have tried batch normalization and clamp. The problem still exists.
there are no catastrophic problems of small gradients with any of these approaches per se, but they fail if you push pre-activation values too far (due to big LR, gradient explosions/spikes etc.), usually early in the training. you can try to “bandaid fix” this issue with something like x.clamp(-5,5).sigmoid() * scale.
I also realized that batchnorm is only appropriate for internal layers, and there is normally no reasons to restrict ranges there.
Do you mean I need to apply it on output like this:
x = self.main(x)
output = x.clamp(-5,5).torch.sigmoid()*scale
There is an issue here. I tried different things to force the output to be positive and in range but for all of them I am getting a very small value close to zero for the output.(methods make the output positive and since it is close to zero it is in range ). It seems that there is something forcing the output toward negative values and with the used method it gets just zero.While I know that the correct value should be a value in range 0 and 0.5(output should linearly change with input).
restraints may work well for distribution parameters, esp. in models with random sampling, I’m not sure about using them on loss inputs - you kind of need balanced positive/negative errors, if your targets are often at bounds or with some other conditions, bounds will be strong attractors (affecting shared parameters in your output layer).
In other words, if you use say MSELoss, you postulate that your model should have gaussian errors of predictions, but bounded predictions conflict with that, esp. with narrow ranges.