# Weights not updating for custom loss function

I know many have answered this issue but my problem is slightly different which I could not find an answer for.

Basically, I am creating my own custom loss (cost) function, however, it uses the output of the MLP as a means to calculate the actual loss in the system - the MLP output is not directly transferred in the loss function. I have coded a very simplified example of my problem for your understanding.

My intuition is that my current implementation is breaking the computational graph and thus the optimizer is not updating the weights. So my question is how could this be modified in order to make sure the computational graph does not break down and the weights update.

Please let me know if you need more explanation(s) of my problem. Thank you in advance!

Below is the sample code:

``````import torch.nn as nn
import torch.nn.functional as F
import torch

class MLP(nn.Module):
def __init__(self, input_dim, output_dim, activation, bias):

super().__init__()

self.input_fc = nn.Linear(input_dim, 200, bias=bias)
self.hidden_fc = nn.Linear(200, 150, bias=bias)
self.output_fc = nn.Linear(150, output_dim, bias=bias)
self.activation = activation

def forward(self, x):

h_1 = self.activation(self.input_fc(x))
h_2 = self.activation(self.hidden_fc(h_1))
y_pred = self.output_fc(h_2)

y_prob = F.softmax(y_pred, dim=1)

return y_prob

def cost(y_prob, target):
torch.manual_seed(10)
cost_array = torch.rand(1,output_dim) # random cost array
idx_max = torch.argmax(y_prob)

# To simplify, the cost is chosen with respect to the node with highest probability

return cost

torch.manual_seed(10) # just to have same result every run

input_dim, output_dim = 10, 9

activation = F.relu
bias = True
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Classifier = MLP(input_dim, output_dim, activation, bias).to(device)
learning_rate = 1e-3

numEpochs = 10
input_mlp = torch.rand(1,input_dim).to(device)

target = torch.tensor(0).to(device) # target value is set to zero

for epoch in range(numEpochs):

# using weight norm to check if the weights are changing
weight_norm = 0
for param in Classifier.parameters():
weight_norm += torch.norm(param,2).item()

y_prob = Classifier(input_mlp)

loss = cost(y_prob, target)

print('Weight norm: {:.10f}, Loss = {:.4f}'.format(weight_norm, loss.item()))

loss.backward()
optimizer.step()
``````

That’s correct, since `torch.argmax` is not differentiable and you are trying to “fix” this by re-wrapping the result into a new tensor with `requires_grad=True`.

1 Like

Thanks for confirming my doubt. any fix for this that you may suggest? I would really appreciate it

I’m unsure how the `cost` function should work right now as it seems you are randomly initializing the `cost_array` and try to use the predictions from the model to index it. `cost_array` is not attached to the computation graph at all and I don’t know how the gradients should be calculated in this case (it won’t have a valid `.grad_fn` as it’s a new leaf tensor).

1 Like

The cost array in the example although random, in my actual problem, it comes from the system that i am working on. Anyhow, I understand what you are saying but I really hope to find a solution to it. Please respond in case you have any “click” in this matter