I’d like to train a model like below:
model has 2 internal submodel which are all (individually) trainable.
model has proportional parameter w, which has restricted range [0, 1]
and w is also trainable.
How can I achieve this with restricted range? just clamp w for every iteration?
What does w signify? My first idea is similar to yours where you can clamp w to be within [0,1] but perhaps it might get stuck where w is either 0 or 1. I guess one way would be to try this and see if w changes as you train the network. Another idea is perhaps to add a term to the loss function where you punish it for being larger than 1 and similarly if it’s below 0. Another idea I had is to use a sigmoid?
w means that proportional contribution of submodel 1 to total model.
for example, if w=0.1, submodel 1 contributes to total model as 10% and submodel 2 contributes 90%.
as you think, i’m afraid of that w stucks at 0 or 1. I’ll try your suggestion(sigmoid). is there other way to prevent stuck?
I guess the cleanest method would be to use a sigmoid activation on w and then multiply it
w = torch.randn(3,3)
w = torch.sigmoid(w)
output = w * y_pred1 + (1-w) * pred_2
Sigmoid has a very smooth gradient and is differentiable everywhere.