I want to scale a vector by its p2 norm, i.e.,
y = x / norm(x, p=2). The analytic gradient of this operation w.r.t. each element in the vector is
1 / ||x||_2 - x_i^2 / ||x||_2^3. However, I found that the gradient computed by pytorch is different from this analytic solution. Not sure if I made some mistakes here or pytorch does not compute it correctly?
Here is a minimal example to reproduce this inconsistent issue:
import torch import torch.nn as nn # Function to compute the gradient of vector normalization def gradient_vector_norm(x): # Convert input to PyTorch tensor if not already x = torch.tensor(x, requires_grad=True, dtype=torch.float) # Compute the normalized vector norm_x = x.norm(p=2) y = torch.div(x, norm_x).sum() y.backward() return x.grad, norm_x # Test the function vector = [3.0, 4.0] gradient, norm_x = gradient_vector_norm(vector) print("Gradient w.r.t. each element:", gradient) analytic_gradient = 1. / norm_x - torch.tensor(vector) ** 2 / (norm_x ** 3) print (analytic_gradient)
Pytorch give a gradient of [ 0.0320, -0.0240], while the analytic solution is [0.1280, 0.0720], which are inconsistent.