**TL;DR**, does computing `torch.var(x)`

on a 1D tensor of n values have a quadratic runtime with autograd? Or does autograd it have a linear runtime in n?

**Longer version:**

If I have n values in a 1D tensor, and I compute var within a nn.Module, will autograd compute an n by n Jacobian in quadratic runtime?

For example, if we have values

`x = torch.tensor([1.,2.,4.,8.,16.,32.,64.], requires_grad=True)`

then `y = torch.var(x)`

will set `grad_fn`

in `y`

.

The variance can be conceptualized as computing all pairwise distances (with a quadratic number of operations) and shifting and scaling:

`torch.sum( (x.view(-1,1) - x.view(1,-1))**2 / (2*len(x)*(len(x)-1)) )`

Or the varaince can be conceptualized as caching the mean (with a linear number of operations), computing all distances to that mean (also linear) and then scaling the result:

`torch.sum( (x - torch.mean(x))**2 / (len(x)-1) )`

In the first way, it’s clear that all pairs of values in `x`

affect `torch.var`

, and so an n by n matrix will be computed by autograd.

In the second (more standard way), the mean is cached, with autograd depicting a dependency of the mean on each value from `x`

. Then, the mean collides with every value from `x`

to compute the variance. So in this case, it appears the gradients will be quadratic as well. Is that the case?

Also, how is there a way that I could’ve delved into `y.grad_fn`

to determine this for myself without asking and without empirically measuring whether the runtimes seem to grow quadratically? Thank you!