Is it okey to convert tensor to numpy and calculate the loss value, and convert that value to tensor and backpropagate?
That won’t work as you are detaching the computation graph by calling numpy operations.
Autograd won’t be able to keep record of these operations, so that you won’t be able to simply backpropagate.
If you need the numpy functions, you would need to implement your own
backward function and it should work again. Have a look at this tutorial for more information.
Basically you autograd will track all operations as long as you stay in PyTorch land. Could you check, if your numpy functions are available in PyTorch?
Thank you for response.
I wanted to perform neural style transfer, however instead of content and style loss functions, i’ve intended to use a loss that has distance transform in it. Numpy function is here: Distance transform, but it seems PyTorch doesn’t have it yet.
I’ll try the tutorial though.
I also implement custom loss with numpy. The custom loss doesn’t have
backward function. But my model works. Why?
Your custom loss function using numpy should detach the loss from the computation graph, so that all PyTorch parameters, which were used before detaching won’t get a gradient.
Could you check it by printing the
grad attribute of some parameters after calling
Hi, Have you solved the problem? I also want to perform numpy to tensor loss function for the Neural Style Transfer. Thank you very much
Hi, I’ve followed this tutorial https://pytorch.org/docs/master/notes/extending.html , somehow it worked
Not sure if I should start a new thread… I’m calculating a weight map for a segmentation task, that needs distance transform (it’ll weight a CE loss). Everything is on PyTorch land except the weight map. In the end, this weight map is just numbers. Will that backpropagate?
(How does cross_entropy loss (torch._C._nn.nll_loss) use the weight argument after all?)
It should work, since your weight map would just scale the loss and thus the gradients.
Hi, what if we are adding some components to the loss. And that components are being computed after detaching.
L= pytorch loss + numpy loss
Resultant L is also a pytorch tensor.
My network performance is getting affected. Although it looks like numpy loss part will have no effects during training.
What can be the reason for that effect, please
There won’t be any effect, as you are adding a constant value to the loss.
How reproducible is this effect? E.g. are you seeing the training constantly affected by it for 10 different runs?
I have some doubts when we need to create a own backward function to include one external value in loss function. For example, to solve this we need create our backward function, but I do not understand how can I do this because when we create this type of function we need to return tensors with gradients, and when we convert by numpy we don’t get gradients, am I rigth?
criteria= torch.nn.MSELoss() outputs, latent_space= model(X) latent_space= latent_space.cpu() L_S_nump= latent_space.detach().numpy() value,counts = np.unique(L_S_nump, return_counts=True) norm_counts = counts / counts.sum() entro = -(norm_counts * np.log2(norm_counts)).sum() mse = criteria(O,E) loss = mse + entro
Yes, if you are using numpy operations, Autograd won’t be able to track these operations and you would thus detach the computation graph. You could write a custom
autograd.Function and define your
backward method there as described in this tutorial.
However, based on your code snippet you could also replace the numpy functions (
np.log2) with the PyTorch equivalents.
Thank you for reply, I will follow that tip.
Backprop to compute gradients of a, b, c, d with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_a = grad_y_pred.sum()
grad_b = (grad_y_pred * x).sum()
grad_c = (grad_y_pred * x ** 2).sum()
grad_d = (grad_y_pred * x ** 3).sum()
I am following this example Calculate backpass and forward using numpy for third degree polynomial . I am confused why we multiply 2.0 with (y_pred - y)?
Second question is while updating weight why this hyphen is showing?
a -= learning_rate * grad_a
If I am not wrong it’s a square of (y_predict - y). so 2.0 is power. we can also calculate it by multiplying it with 2 (y_predict-y) or np.square (y_predict - y)? If this is correct provide me the second answer why we use hyphen sign with variable?
2would be the factor used in the derivative of the square.
- The “hypen” is an inplace subtraction, which will subtrace
learning_rate * grad_afrom
a. The out of place version would be:
a = a - (learning_rate * grad_a)