Please help! Custom loss to call other function causes network to not update any weights

birdy654 · March 19, 2019, 7:37pm

Hello!

I’m really enjoying using PyTorch for classification and regression. I have an interesting problem and I can’t quite figure out the solution, I feel like I’m really close.

My problem:
I have created a network with three outputs, let’s call them x, y and z
I have a function F(x, y, z) that returns a value between 0.0 and 100.0 where 100 is better
My custom loss thus is 100-F(x,y,z) at each step
The goal is to figure out the best combination of outputs for problem F(…)
(I know a genetic algorithm will outperform this, that’s my project right now to prove it on an array of problems)

To implement the above, I force the network to have 1 piece of input data and a batch size of 1, and then in the fitness we just completely ignore the ‘true’ and ‘predicted’ values and replace the loss with 100-F(x,y,z). Basically our weights and outputs will lead to one solution at every epoch.

Outputs are rounded to integers since F(…) requires them. To prevent this from being an issue, I have a large momentum and learning rate.

The issue I’m having is that, although the loss function is running and my first [x,y,z] is being evaluated, the values never change. The network isn’t learning from the results produced.

My code is as follows:
Note testnetwork() is too long to paste but it is the F(x,y,z) mentioned above - any dummy function can replace it eg. 'return x+zy/2’ etc. to minimise this function (100 - x+zy/2)

import torch
import torch.nn as nn

from testnetwork import *


n_in, n_h, n_out, batch_size = 10, 5, 3, 5

x = torch.randn(batch_size, n_in)
y = torch.tensor([[1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [1.0], [0.0], [1.0], [1.0]])

model = nn.Sequential(nn.Linear(n_in, n_h),
                     nn.ReLU(),
                     nn.ReLU()
                     )

def fitness(string):
    print(string)
    list = string.split(",")
    list[0] = (int(round(float(list[0]))))
    list[1] = (int(round(float(list[1]))))
    list[2] = (int(round(float(list[2]))))
    print(list)
    loss = 100 - testnetwork(list[0], list[1], list[2])
    return loss


def my_loss(output, target):
    table = str.maketrans(dict.fromkeys('tensor()'))
    ftn = fitness(str(output.data[0][0]).translate(table) + ", " + str(output.data[0][1]).translate(table) + ", " + str(output.data[0][2]).translate(table))
    

    loss = torch.mean((output - output)+ftn)
    
    return loss



#optimizer = torch.optim.SGD(model.parameters(), lr=1, momentum=2)
optimizer = torch.optim.Adam(model.parameters(), lr=1, momentum=2)

for epoch in range(10):
    # Forward Propagation
    y_pred = model(x)
    # Compute and print loss
    loss = my_loss(y_pred, y)
    print('epoch: ', epoch,' loss: ', loss.item())
    # Zero the gradients
    optimizer.zero_grad()
    
    # perform a backward pass (backpropagation)
    loss.backward(retain_graph=True)
    
    # Update the parameters
    optimizer.step()

Thank you so much for reading my post!
Jordan

Edit: This is the console output if you want to see it

epoch:  0  loss:  50.339725494384766
0., 0.0200, 0.6790
[0, 0, 1]
testing: [0, 0, 1]
epoch:  1  loss:  50.339725494384766
0., 0.0200, 0.6790
[0, 0, 1]
testing: [0, 0, 1]
epoch:  2  loss:  50.339725494384766
0., 0.0200, 0.6790
[0, 0, 1]
testing: [0, 0, 1]
epoch:  3  loss:  50.339725494384766
0., 0.0200, 0.6790
[0, 0, 1]
testing: [0, 0, 1]
epoch:  4  loss:  50.339725494384766
0., 0.0200, 0.6790
[0, 0, 1]

…and so on, nothing seems to change from epoch to epoch.

ptrblck · March 20, 2019, 3:02pm

I’m not sure I understand your code correctly, but there might be some parts missing.

E.g. your model is currently a single linear layer with two ReLUs afterwards.
It seems the output layer is missing.
Also, in your my_loss method you are calculating torch.mean((output-output)+ftn), so basically just torch.mean(ftn). Since testnetwork is missing, I’m not sure what fitness does and how your loss is supposed to work.

Could you check these issues or explain them a bit if I’m missing something?

birdy654 · March 20, 2019, 3:46pm

Hi ptrblock,

Thank you for your response. I don’t have an activation on the outputs because I just want the three numbers output (like regression problems) - have I programmed this wrong? My previous experience is Keras and with that I simply don’t give the final layer an activation function

I did output-output as a last ditch effort since I want my loss of the network to simply be ftn (torch.mean(ftn) won’t compile). It doesn’t work since no training is performed after the initial random distributions are tested for fitness. I think it’s something to do with the gradients being killed but I’m unfortunately not knowledgeable enough to figure out how to overcome this.

Pretty much all I want the network to do is learn from a single dummy input over and over again where the fitness is a custom fitness(output 1, output 2, output 3) and thus will update weights towards minimizing this function.

Thank you

ptrblck · March 20, 2019, 4:10pm

The gradients will be all zero, since output-output creates a zero tensor and ftn is detached from the computation graph using the str transformations.
The loss calculation should depend on the actual computation graph, i.e. the model prediction should be involved in the calculation and not cancel out itself.

Your current models is basically: linear -> ReLU -> ReLU.
If you just want a single linear layer without a non-linearity, you could just use:

model = nn.Linear(n_in, n_out)

Here is a simple training routine learning just constant values:

model = nn.Linear(n_in, n_out)
x = torch.randn(batch_size, n_in)
y = torch.tensor([[100.0, 100.0, 100.0]]*batch_size)
optimizer = torch.optim.Adam(model.parameters(), lr=1,)

for epoch in range(200):
    optimizer.zero_grad()
    output = model(x)
    loss = F.mse_loss(output, y)
    loss.backward()
    optimizer.step()
    print('Epoch {}, loss {}'.format(
        epoch, loss.item()))

print('output', output)
> output tensor([[100.0103, 100.0111, 100.0113],
        [100.0012, 100.0013, 100.0013],
        [100.0001,  99.9994,  99.9993],
        [100.0003, 100.0002, 100.0002],
        [ 99.9976,  99.9974,  99.9973]], grad_fn=<AddmmBackward>)

Let me know, how your custom loss function should work and we could try to adapt the code to fit your use case.

birdy654 · March 20, 2019, 6:28pm

Hello,

Firstly let me thank you so much for helping me, I really appreciate this.

I think by the sounds of your first comment I may have gotten confused on constructing a network as well as the loss function processes, it should look something like this

`
layer_name(neurons)

in(1)(dummy) ->
hidden1(32)some_activation ->
hidden2(32)some_activation ->
out(3)no_activation
`

I then want to (rather than classification errors/MSE etc.) at each loss, send the three numerical values output in the final layer to my fitness function.

The fitness function is testnetwork(out[0], out[1], out[2]) where the three parameters are the numerical values predicted by the network. It must receive only integers so that’s why I int(round(float(out[x]))) so the numerical outputs are rounded to the nearest 1 and cast as an int.

Fitness will return a float score between 0.0 and 100.0 where 100.0 is the best score possible and 0.0 is the worst. Because of this, my loss is:
loss = 100 - testnetwork(list[0], list[1], list[2])

Since 100-testnetwork will then give a value between 0.0 and 100.0 where 0.0 is the best (so loss can now be minimised to a good outcome)

What testnetwork(x,y,z) is actually doing in the background is training a deep dense neural network of x,y, and z neurons in hidden layers 1, 2 and 3. The goal thus is to try and find the best network to classify the dataset given to this second network, ie. the PyTorch Network is learning to search for the best configuration for the second network. (I have genetic and PSO algorithms that do this but I want to compare it to a double network approach to compare the approaches, for better or worse)

My mistake I think is that I misunderstand the structure and processes of Tensors, I initially thought the loss was simply a metric I could replace with ease to an arbitrary float but unfortunately not. I’m looking for a hacky solution to be able to do this where the metric is based on the multiple outputs of the network at each step and the outputs are evaluated by an external function.

Hope you can help
Thank you very much again!
Jordan

EDIT: Upon experimentation with one Linear layer, I tried this as my loss:

loss = torch.mean((output - output.detach())+ftn)

Now, the weights change, but they go down no matter what happens, here’s the output from the console (note, coincidentally [1,2,0] and [1,1,0] have the same fitness, numbers should be bigger really, ergo the bigger steps in the optimiser)

1. 1.4133, 1.5719, 0.2245

2. ['1.4133', ' 1.5719', ' 0.2245']

3. testing: [1, 2, 0]

4. epoch:  0  loss:  55.2411003112793

5. 1.2133, 1.3719, 0.0245

6. ['1.2133', ' 1.3719', ' 0.0245']

7. testing: [1, 1, 0]

8. epoch:  1  loss:  55.2411003112793

9. 0.8133, 0.9719, -0.3755

10. ['0.8133', ' 0.9719', ' -0.3755']

11. testing: [1, 1, 0]

12. epoch:  2  loss:  55.2411003112793

13. 0.2133, 0.3719, -0.9755

14. ['0.2133', ' 0.3719', ' -0.9755']

15. epoch:  3  loss:  100.0

16. -0.5867, -0.4281, -1.7755

17. ['-0.5867', ' -0.4281', ' -1.7755']

18. epoch:  4  loss:  100.0

19. -1.5867, -1.4281, -2.7755

20. ['-1.5867', ' -1.4281', ' -2.7755']

21. epoch:  5  loss:  100.0

22. -2.7867, -2.6281, -3.9755

23. ['-2.7867', ' -2.6281', ' -3.9755']

24. epoch:  6  loss:  100.0

25. -4.1867, -4.0281, -5.3755

26. ['-4.1867', ' -4.0281', ' -5.3755']

27. epoch:  7  loss:  100.0

28. -5.7867, -5.6281, -6.9755

29. ['-5.7867', ' -5.6281', ' -6.9755']

30. epoch:  8  loss:  100.0

31. -7.5867, -7.4281, -8.7755

32. ['-7.5867', ' -7.4281', ' -8.7755']

33. epoch:  9  loss:  100.0

birdy654 · March 21, 2019, 2:02pm

PS: If I switch around the loss to:

loss = torch.mean((output.detach() - output)+ftn)

then the opposite occurs, the outputs simply increase according to the learning rate and momentum, with no influence from the discovered value of ftn according to the three outputs

electric93 · August 22, 2021, 11:37am

I’m having the same issue … did you solve this? And if so, how?

birdy654 · August 22, 2021, 9:19pm

Hi @electric93 - it’s been a long time since I posted this and I’ve since learnt a lot about neural networks! I instead decided to go with a Deep Q Learning approach for this problem rather than a direct learning function, but I’ve found that heuristic searches are often much better and are less computationally expensive

electric93 · August 22, 2021, 9:27pm

Okay maybe your problem is not as close to mine as I thought. My issue is that I generate parameters using a CNN that then drive a rendering software (Blender) which outputs an image and then I compare that image to the target image (which I do have) and that is basically the loss. But since all of this is externally (I’d be very surprised if something like this is possible with pytorch functions), I don’t know how to backpropagate such a loss, since there’s basically a gap in the computational graph, since all of the rendering and image comparison is not part of the pytorch functions. How would you go about something like this?
Thank you for your time and have a wonderful day!