Torch.var() and torch.std() return nan

jmaronas · March 4, 2019, 2:27pm

When torch tensor has only one element this call returns a nan where it should return a 0.

import torch
a=torch.tensor([0.2550])
print(a.var())

albanD · March 4, 2019, 2:32pm

Hi,

Isn’t the variance based on a set of n samples supposed to be:

var = sum_i (x_i - mean(x)) / (n-1)

jmaronas · March 4, 2019, 2:52pm

well, that is the unbiased estimate of the maximum likelihood estimator of the variance of a Gaussian distribution as far as I know. The empirical variance is defined as:

1/N sum_i (xi -mean)**2

edit: I have checked over the internet and it seems that, under certain conditions, the unbiased estimate of the variance you mentioned also holds

rasbt · March 4, 2019, 3:26pm

Sorry for nitpicking, but the variance is defined as E[(population mean - x)^2] and both formulas above are for empirical variances. But like you said, one is a biased and one an unbiased estimator.

I would prefer if PyTorch would implement the biased estimator without the Bessel’s correction term by default as it would then be similar to NumPy and would avoid the division by zero. In 99.9% of the cases in DL, we don’t care about biased/unbiased estimates of the variance anyway because our goal is to have unit variance for other numerical reasons.

That being said, I just see that it’s possible to turn it off:

function:: var(input, dim, keepdim=False, unbiased=True, out=None)

jmaronas · March 4, 2019, 3:32pm

In pattern recognition and machine learning from Bishop page 94 I checked that this formula is the variance of the maximum likelihood estimate of a Gaussian distribution.

The empirical variance is defined as I mentioned, because in the limit of N->infinite it will converge to the true variance definition… Empirical statistics are statistics computed over the empirical distribution, instead of the true one. You could see it as a monte carlo estimate of the true variance.

rasbt · March 4, 2019, 3:44pm

Will check later, but like you said, he is probably talking about an estimate (hence empirical). Gaussian distribution is is not involving discrete variables, hence you need an integral to define the non-empirical variance. So, everything with a “sum” is empirical as it is over a sample.

But in any case, this is beside the main point you raise with the 1/(n-1) term.

jmaronas · March 4, 2019, 3:51pm

Yes, that correction can be applied. Anyway I think that the empirical variance should be computed as I mentioned or at least make it explicit in the documentation.

rasbt · March 4, 2019, 3:56pm

I agree with you that “1/N sum_i (xi -mean)**2” should be the default. Then it would also avoid the “nan” issue you encounter by default (plus be more intuitive for people coming from NumPy)

Just checking the documentation:

.. function:: var(input, dim, keepdim=False, unbiased=True, out=None) -> Tensor
...
If :attr:`unbiased` is ``False``, then the variance will be calculated via the
biased estimator. Otherwise, Bessel's correction will be used.

Args:
...
    unbiased (bool): whether to use the unbiased estimation or not
...

It is mentioned there though.