Why does the normal distribution have `loc` and `scale` as attributes?

The “normal” thing to call them IMO would be mu and sigma or mean and std, why was loc and scale chosen? Is there some background to this?

I was very confused by the names the first time I had to use the distributions…

Superficially, like many things in PyTorch, this is in alignment with NumPy, or here scipy.stats.

Going more in depth, this is to generalize shifting (loc) and coordinate-scaling (scale) distribution from a “standard” location and scale. For example, the lognormal distribution’s scale parameter is doing scaling, too, and not just changing σ.
Wikipedia has an entry on scale parameter elaborating on the concept.

Best regards


1 Like