Question about functional.normalize and torch.norm

I am probably misunderstanding something but:

In the docs of functional.normalize ( we can read:

Performs Lp normalization of inputs over specified dimension.
Does v=v / max(‖v‖p,ϵ)
for each subtensor v over dimension dim of input. Each subtensor is flattened into a vector.

So if I do the following

import torch
import torch.nn.functional as F
x = torch.randn((4, 3, 32, 32))
x = F.normalize(x, dim=0, p=2)

I would expect that each subtensor along dim 0 (for instance x[0]) will have a L2 norm equal to 1.
However, this isn’t the case

torch.sqrt(torch.sum(x[0]**2)) # != 1

(I use pytorch 0.4.1 with CUDA 9.2)

1 Like

The tensor is normalized over dimension dim, such that:

(x[:, 0, 0, 0])**2.sum() == 1
(x[:, 0, 0, 1])**2.sum() == 1

In your use case you could do the following:

x_ = F.normalize(x.view(x.size(0), -1), dim=1, p=2).view(x.size())
(x_[0]**2).sum() == 1

I see thanks. In that case, isn’t the relevant part of the documentation a little bit misleading?

I’m not sure, as I’m not that familiar with the English mathematical description of such operations.

@tom is a great mathematician. Maybe he could put his 2 cents in.

Haha. Thanks.

I agree that the description is not as clear as it could be, but maybe it’s more the shaping that isn’t clear rather than the mathematical bits.

for each subtensor v over dimension dim of input.

Maybe it becomes clearer when you add the shape information: For a tensor of sizes (n_0, …, n_dim, …n_k), the each n_dim-element vector v along dimension dim is transformed as … and the equation.

Each subtensor is flattened into a vector, i.e. ∥v∥p\lVert v \rVert_p∥v∥p​ is not a matrix norm.

This sentence seems to be particularly misleading, and I would suggest to strike it - given that the things that are normed are one-dimensional, how could they be a matrix norm.

Best regards


Edit: P.S.: I made this a quick PR. Thank you, @alpapado, for your feedback on the documentation! It helps us improve.