Calculating loss with numpy function

Is it okey to convert tensor to numpy and calculate the loss value, and convert that value to tensor and backpropagate?

3 Likes

That won’t work as you are detaching the computation graph by calling numpy operations.
Autograd won’t be able to keep record of these operations, so that you won’t be able to simply backpropagate.

If you need the numpy functions, you would need to implement your own backward function and it should work again. Have a look at this tutorial for more information.

Basically you autograd will track all operations as long as you stay in PyTorch land. Could you check, if your numpy functions are available in PyTorch?

3 Likes

Thank you for response.
I wanted to perform neural style transfer, however instead of content and style loss functions, i’ve intended to use a loss that has distance transform in it. Numpy function is here: Distance transform, but it seems PyTorch doesn’t have it yet.
I’ll try the tutorial though.

I also implement custom loss with numpy. The custom loss doesn’t have backward function. But my model works. Why?

1 Like

Your custom loss function using numpy should detach the loss from the computation graph, so that all PyTorch parameters, which were used before detaching won’t get a gradient.
Could you check it by printing the grad attribute of some parameters after calling backward?

Hi, Have you solved the problem? I also want to perform numpy to tensor loss function for the Neural Style Transfer. Thank you very much :smiley:

Hi, I’ve followed this tutorial https://pytorch.org/docs/master/notes/extending.html , somehow it worked :slight_smile:

Not sure if I should start a new thread… I’m calculating a weight map for a segmentation task, that needs distance transform (it’ll weight a CE loss). Everything is on PyTorch land except the weight map. In the end, this weight map is just numbers. Will that backpropagate?

(How does cross_entropy loss (torch._C._nn.nll_loss) use the weight argument after all?)

It should work, since your weight map would just scale the loss and thus the gradients.

1 Like

Hi, what if we are adding some components to the loss. And that components are being computed after detaching.
L= pytorch loss + numpy loss
Resultant L is also a pytorch tensor.

My network performance is getting affected. Although it looks like numpy loss part will have no effects during training.
What can be the reason for that effect, please

There won’t be any effect, as you are adding a constant value to the loss.
How reproducible is this effect? E.g. are you seeing the training constantly affected by it for 10 different runs?

I have some doubts when we need to create a own backward function to include one external value in loss function. For example, to solve this we need create our backward function, but I do not understand how can I do this because when we create this type of function we need to return tensors with gradients, and when we convert by numpy we don’t get gradients, am I rigth?

criteria= torch.nn.MSELoss()
outputs, latent_space= model(X)
latent_space= latent_space.cpu()
L_S_nump= latent_space.detach().numpy()
value,counts = np.unique(L_S_nump, return_counts=True)
norm_counts = counts / counts.sum()
entro = -(norm_counts * np.log2(norm_counts)).sum()
mse = criteria(O,E)
loss = mse + entro

Yes, if you are using numpy operations, Autograd won’t be able to track these operations and you would thus detach the computation graph. You could write a custom autograd.Function and define your backward method there as described in this tutorial.
However, based on your code snippet you could also replace the numpy functions (np.unique, sum(), np.log2) with the PyTorch equivalents.

1 Like

Thank you for reply, I will follow that tip.

Backprop to compute gradients of a, b, c, d with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_a = grad_y_pred.sum()
grad_b = (grad_y_pred * x).sum()
grad_c = (grad_y_pred * x ** 2).sum()
grad_d = (grad_y_pred * x ** 3).sum()
I am following this example Calculate backpass and forward using numpy for third degree polynomial . I am confused why we multiply 2.0 with (y_pred - y)?

Second question is while updating weight why this hyphen is showing?
a -= learning_rate * grad_a

If I am not wrong it’s a square of (y_predict - y). so 2.0 is power. we can also calculate it by multiplying it with 2 (y_predict-y) or np.square (y_predict - y)? If this is correct provide me the second answer why we use hyphen sign with variable?

  1. The 2 would be the factor used in the derivative of the square.
  2. The “hypen” is an inplace subtraction, which will subtrace learning_rate * grad_a from a. The out of place version would be:
a = a - (learning_rate * grad_a)
1 Like