Custom loss function with trainable parameters

mark_eu · March 5, 2022, 7:53pm

Hi everyone,
I need help.
I am trying to create a custom loss function with two trainable parameters.

class MyCustomLoss(nn.Module):
    def __init__(self, my_parameter1, my_parameter2):
       super(MyCustomLoss, self).__init__()
	   self.A = my_parameter1
	   self.B = my_parameter2
    def forward(self, inputs, targets):
	   y_hat_softmax = F.softmax(inputs, dim=1)
	   t = torch.argmax(y_hat_softmax, dim=1)
	   
	   ...some custom code1...
	   
	   MyLoss = self.A*(some custom code2) + self.B*(some custom code3)
	   
	   return MyLoss

For the first prototyping of the idea, I used brute-force training of values A and B between 0 and 1
with some small steps (0.05). I have proof that the idea works, but I would like to get the best
values for A and B - that the network learns these weights.

Do you have any suggestions with the code snippet? Also, I would like to ensure that A and B are between 0 and 1, or at least, positive values.

Thank you

Dazitu616 · March 6, 2022, 4:12am

To make A and B positive, an easy way is to apply ReLU to them before multiplying with the loss, i.e. MyLoss = torch.relu(self.A)*(some custom code2) + torch.relu(self.B)*(some custom code3). Another option is to apply torch.exp to A and B, and this is a common trick people use in training VAE (to make the predicted variance positive)

To make it between 0, 1, similarly, you can apply a sigmoid on A and B

mark_eu · March 6, 2022, 4:26pm

@Dazitu616 thank you! It works with that approach.
I would like to get the final value of the A and B at the end. How is it possible?

Thank you

Dazitu616 · March 6, 2022, 5:00pm

Just print it out? Or load the saved state_dict of the model and find their values?

mark_eu · March 6, 2022, 7:15pm

Thank you, @Dazitu616, for your suggestion.
I added A and B in the optimizer in this way:

optimizer = torch.optim.Adam(list(model.parameters())+list(criterion.parameters()), lr=learning_rate)

It seems to me that printing state_dict is the easiest way. However, the following code does not list A and B:

# Print model's state_dict
print("Model's state_dict:")
for param_tensor in model.state_dict():
    print(param_tensor, "\t", model.state_dict()[param_tensor].size())

print()

# Print optimizer's state_dict
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
    print(var_name, "\t", optimizer.state_dict()[var_name])

Do you have a suggestion on how to approach these values from the state_dict()?

Thank you

Dazitu616 · March 6, 2022, 7:28pm

Oh, sure. I think you need to manually include these two weights to the model by register_params. For example:

class Model(nn.Module):

    def __init__(self):
        A = torch.tensor(0.5)
        self.register_parameter('A', A)

Then you can call it via self.A. And this A will be in the model.state_dict()

mark_eu · March 6, 2022, 7:44pm

Thank you, @Dazitu616.
I am a little bit confused. I have registered these parameters inside the custom loss class (as in the first post of my question). So why is there a need to repeat them inside the module class?
I used this in my custom loss class:

self.A = torch.nn.Parameter(torch.tensor(0.0, requires_grad=True))

Thank you

Dazitu616 · March 6, 2022, 7:56pm

IC, sorry for the confusion. Since you have registered them inside the custom loss class, I think you can print the state_dict of the loss to see A, i.e. criterion.state_dict()['A']

mark_eu · March 6, 2022, 8:19pm

Thank you, @Dazitu616.
Yes, that is the correct syntax - I get the values. However, I get just the initial values (0.0). Do you have any idea where the catch is?

Thank you

Dazitu616 · March 6, 2022, 8:33pm

This is just my guess: since you are minimizing the loss using an optimizer, so if A goes to 0, then the loss will be 0, which is exactly the smallest possible value. So I guess it’s not a good idea of doing so (optimizing the loss scale). Instead, I think you can have some constraints, e.g. A + B == 1, so that the optimizer cannot find the trivial solution by setting A and B both to 0. Instead, it should find a balance which benefits the model. Just my random thought though.

mark_eu · March 6, 2022, 9:29pm

Thank you, @Dazitu616. Your suggestion was the correct one. The initial value can’t be 0.0 but some other value, 1.0, for instance.
I have noticed that training gives a negative value for the A or B parameter for some initial values. So I used a code to have the loss always positive:

MyLoss = torch.relu(self.A)*(some custom code2) + torch.relu(self.B)*(some custom code3)

But is there a way to ensure that a raw value of A is just positive during the training?

Dazitu616 · March 6, 2022, 9:40pm

I’m afraid not. There is no constraints to make them positive right? Maybe you can try with different things torch.relu, torch.abs. But that’s another question.

mark_eu · March 6, 2022, 9:42pm

Thank you, @Dazitu616, for your suggestions.