For the first prototyping of the idea, I used brute-force training of values A and B between 0 and 1
with some small steps (0.05). I have proof that the idea works, but I would like to get the best
values for A and B - that the network learns these weights.
Do you have any suggestions with the code snippet? Also, I would like to ensure that A and B are between 0 and 1, or at least, positive values.
To make A and B positive, an easy way is to apply ReLU to them before multiplying with the loss, i.e. MyLoss = torch.relu(self.A)*(some custom code2) + torch.relu(self.B)*(some custom code3). Another option is to apply torch.exp to A and B, and this is a common trick people use in training VAE (to make the predicted variance positive)
To make it between 0, 1, similarly, you can apply a sigmoid on A and B
Thank you, @Dazitu616.
I am a little bit confused. I have registered these parameters inside the custom loss class (as in the first post of my question). So why is there a need to repeat them inside the module class?
I used this in my custom loss class:
IC, sorry for the confusion. Since you have registered them inside the custom loss class, I think you can print the state_dict of the loss to see A, i.e. criterion.state_dict()['A']
Thank you, @Dazitu616.
Yes, that is the correct syntax - I get the values. However, I get just the initial values (0.0). Do you have any idea where the catch is?
This is just my guess: since you are minimizing the loss using an optimizer, so if A goes to 0, then the loss will be 0, which is exactly the smallest possible value. So I guess it’s not a good idea of doing so (optimizing the loss scale). Instead, I think you can have some constraints, e.g. A + B == 1, so that the optimizer cannot find the trivial solution by setting A and B both to 0. Instead, it should find a balance which benefits the model. Just my random thought though.
Thank you, @Dazitu616. Your suggestion was the correct one. The initial value can’t be 0.0 but some other value, 1.0, for instance.
I have noticed that training gives a negative value for the A or B parameter for some initial values. So I used a code to have the loss always positive:
I’m afraid not. There is no constraints to make them positive right? Maybe you can try with different things torch.relu, torch.abs. But that’s another question.