I want to scale a vector by its p2 norm, i.e., `y = x / norm(x, p=2)`

. The analytic gradient of this operation w.r.t. each element in the vector is `1 / ||x||_2 - x_i^2 / ||x||_2^3`

. However, I found that the gradient computed by pytorch is different from this analytic solution. Not sure if I made some mistakes here or pytorch does not compute it correctly?

Here is a minimal example to reproduce this inconsistent issue:

```
import torch
import torch.nn as nn
# Function to compute the gradient of vector normalization
def gradient_vector_norm(x):
# Convert input to PyTorch tensor if not already
x = torch.tensor(x, requires_grad=True, dtype=torch.float)
# Compute the normalized vector
norm_x = x.norm(p=2)
y = torch.div(x, norm_x).sum()
y.backward()
return x.grad, norm_x
# Test the function
vector = [3.0, 4.0]
gradient, norm_x = gradient_vector_norm(vector)
print("Gradient w.r.t. each element:", gradient)
analytic_gradient = 1. / norm_x - torch.tensor(vector) ** 2 / (norm_x ** 3)
print (analytic_gradient)
```

Pytorch give a gradient of [ 0.0320, -0.0240], while the analytic solution is [0.1280, 0.0720], which are inconsistent.