I have an ML model (A GNN to be more precise) with a softmax activation function in its output layer, with num_classes = K.

They way my custom loss function works is as follows (summary):

Takes the output of the whole dataset.

Samples a class for each data point.

Build a graph structure with max K nodes.

Runs an evaluation algorithm of my own that return a Value. The higger the value, the better the classification.

Then, I use Pytorch loss.backward() to update the gradients.

def train(data):
optimizer.zero_grad() # Clear gradients.
out = model(data.x, data.edge_index) # Perform a single forward pass.
L = sampling_classes(values, torch.exp(out))
graph = build_graph(L)
loss = loss_value_function(graph, out)
loss.backward() # Derive gradients.
optimizer.step() # Update parameters based on gradients.
return loss

def loss_value_function(graph, out):
val = evaluate_graph(graph)
return (torch.sum(torch.exp(out)) - torch.sum(torch.tensor(val)))

In return (torch.sum(torch.exp(out)) - torch.sum(torch.tensor(val))), I realised I have to substract torch.sum(torch.tensor(val)) to (torch.sum(torch.exp(out)) (which is always the number of data points), otherwise I break the gradient, and then the parameters are not updated if I only return (-1) * torch.sum(torch.tensor(val))). Is this correct? is there any way I can just return (-1) * torch.sum(torch.tensor(val))) and then get the gradient updated when I perform the backward pass?

I tried returng (-1) * torch.sum(torch.tensor(val))) as loss and then doing loss = torch.tensor(loss, required_grad=True), but the parameters are never updated.

When you do torch.tensor(val) you’re breaking your computational graph, which is why your parameters are not updating. So, remove all these torch.tensor and re-run your code and see if it updates then!

Hi @AlphaBetaGamma96, Thank you for your reply. I have tried this, however, if I remove torch.tensor(), then when I do loss.backward() I get this Error: AttributeError: 'numpy.float64' object has no attribute 'backward':

def loss_value_function(graph, out):
val = evaluate_graph(graph)
return np.sum(val)

You cannot use numpy arrays directly as this will also break the computation graph. Either use PyTorch operations only or write a custom autograd.Function if you need to use numpy operations.

Hi @ptrblck , thank you for your response. That’s true. I also tried return torch.sum(torch.tensor(VK, requires_grad=True)), but the computational graph breaks and the parameters don’t update as well.
However, if I do return (torch.sum(torch.exp(out)) - torch.sum(torch.tensor(VK))), then the parameters are updated. Although I find this pretty weird. Any suggestion?

Recreating a tensor will break the computation graph as already mentioned. If your second approach works fine I would guess that another tensor might be attached to a computation graph (out) while VK acts as a constant and won’t affect the gradient calculation.

Yes, this is what I thought. out is a tensor, and it is constant. Then, I just use VK as a value to be substracted from out. The greate VK, the smaller the loss.

Ok, I found what’s the problem. The problem is that I was doing a random sampling in my log_softmax output layer and that, breaks the gradient. In order to update the parameters when we want to do a random sampling, we need to create a categorical distribution, sample from that distribution and then, update the parameters with the log derivative:

def train(data):
optimizer.zero_grad() # Clear gradients.
out = model(data.x, data.edge_index) # Perform a single forward pass.
m = torch.distributions.categorical.Categorical(out) # Creates a categorical distribution
L = m.sample() # Samples from the distribution. Each datapoint is assigned to a class.
graph = build_graph(L)
val = loss_value_function(graph)
loss = -m.log_prob(L) * val
loss.sum().backward() # Derive gradients.
optimizer.step() # Update parameters based on gradients.
return loss

def loss_value_function(graph):
val = evaluate_graph(graph)
return val