Performs Lp normalization of inputs over specified dimension.
Does v=v / max(‖v‖p,ϵ)
for each subtensor v over dimension dim of input. Each subtensor is flattened into a vector.

So if I do the following

import torch
import torch.nn.functional as F
x = torch.randn((4, 3, 32, 32))
x = F.normalize(x, dim=0, p=2)

I would expect that each subtensor along dim 0 (for instance x[0]) will have a L2 norm equal to 1.
However, this isn’t the case

I agree that the description is not as clear as it could be, but maybe it’s more the shaping that isn’t clear rather than the mathematical bits.

for each subtensor v over dimension dim of input.

Maybe it becomes clearer when you add the shape information: For a tensor of sizes (n_0, …, n_dim, …n_k), the each n_dim-element vector v along dimension dim is transformed as … and the equation.

Each subtensor is flattened into a vector, i.e. ∥v∥p\lVert v \rVert_p∥v∥p is not a matrix norm.

This sentence seems to be particularly misleading, and I would suggest to strike it - given that the things that are normed are one-dimensional, how could they be a matrix norm.