Using Custom Loss for Maximizing Score in PyTorch

Loss_Function/Maximize_Function/Score_Function, CustomLoss, pytorch. Using Custom Loss for Maximizing Score in PyTorch

I’m using a PyTorch model with an LSTM input layer, a linear hidden layer, and 3 neurons in the output layer with a softmax activation function.

Instead of using a loss function (such as nn.MSELoss()) and passing the model’s prediction and its target (loss = criterion(predictions, target)), I would like to use a custom scoring function that I’ve created. This scoring function is represented as score = get_score(predictions), which implies a concept similar to maximizing (predictions, score).

However, since I don’t possess a target for each input value of the model, the model would be evaluated based on its score (which is a single number). It appears that considering a single value as a basis for assessing a set of predictions across the entire dataset might not be valid.

Is something like this possible? The times I’ve attempted it, I often encounter an error during the ‘loss’.backward() phase, as it fails to trace the necessary gradients for backpropagation.

I didn’t put the code because it’s a very big code, but I’ll make a simple example.

class Model_123(nn.Module):
    def __init__(self):
        self.linear1 = nn.Linear(10, 3)

    def forward(self, x):
        out = self.linear1(x)
        out = F.softmax(out, dim=1)
        return out 

dataset = torch.randn(10, 10)
model = Model_123()

custom_loss_score = CustomLoss_Score()  # This is what I want to figure out
for epoch in range(epochs):

    predictions = model(dataset)
    score = random.uniform(0, 100)

    # Random score just for demonstration, but it would be calculated based on the model's actions.
    # Using torch.argmax on the predictions made by the model for that input value.
    # This gives me a success rate, where correctly predicting action 0 or action 1 gets +1,
    # while action 2 means doing nothing. In a situation where the model correctly predicts only one action,
    # between actions 0 and 1, and everything else is action 2, leading to a 100% success rate.
    # This scenario is not good as there's no criteria; it just guessed one action and got 100% score.
    # It would be better if the model correctly predicted 80% of actions with a 70% operation rate,
    # which would be a better outcome. So, based on the success rate, I create another function
    # to calculate the score, a relation between the success rate and the number of operations performed.
    # This is a much larger code that I haven't included here.

    loss_score = custom_loss_score(predictions, score)  # Maximize the score

This scoring mechanism, represented as score, would ideally be maximized by adjusting the model’s parameters. However, I’m unsure about how to define the CustomLoss_Score() function that would achieve this.

The score is generated based on the model’s actions, and it’s calculated using a relationship between the accuracy rate and the number of operations performed. The intention is to maximize this score rather than minimizing a traditional loss function.

The reason I’m not using Reinforcement Learning is that in this case, I would have to loop through all states (or input data), which takes more time. On the other hand, with supervised learning, I can approach it in a vectorized manner. I’m following this strategy because I’m also utilizing training based on genetic algorithms, which involve multiple individuals (sets of model weights) making predictions on the data. If I were to use Reinforcement Learning, each individual would have to loop through the states, and this would be quite time-consuming, especially with a population size of, for example, 300 individuals.

Hence, I’m employing a training approach based on gradient descent, along with training using genetic algorithms and artificial selection.

When the model seems to be stuck in a particular place, I take the weights from that model and generate a population of individuals. Genetic algorithms can guide the weights to places where gradient descent might struggle, and vice versa. When one approach isn’t yielding improvements, I switch to the other approach.

If the score was created using the models’ parameters and differentiable operations, you should be able to compute the gradients w.r.t. all used parameters during the backward pass. However, I’m currently unsure how exactly score is calculated (in your example it’s unrelated to the model) and what its connection to the model is.