Here is an implementation of the covariance with pytorch from covariance-and-gradient-support

```
import torch
def cov(m, rowvar=False, inplace=False):
'''Estimate a covariance matrix given data.
Thanks :
- https://discuss.pytorch.org/t/covariance-and-gradient-support/16217/2
- https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L2276-L2494
Covariance indicates the level to which two variables vary together.
If we examine N-dimensional samples, `X = [x_1, x_2, ... x_N]^T`,
then the covariance matrix element `C_{ij}` is the covariance of
`x_i` and `x_j`. The element `C_{ii}` is the variance of `x_i`.
Args:
m: A 1-D or 2-D array containing multiple variables and observations.
Each row of `m` represents a variable, and each column a single
observation of all those variables.
rowvar: If `rowvar` is True, then each row represents a
variable, with observations in the columns. Otherwise, the
relationship is transposed: each column represents a variable,
while the rows contain observations.
Returns:
The covariance matrix of the variables.
'''
if m.dim() > 2:
raise ValueError('m has more than 2 dimensions')
if m.dim() < 2:
m = m.view(1, -1)
if not rowvar and m.size(0) != 1:
m = m.t()
m = m.type(torch.double)
fact = 1.0 / (m.size(1) - 1)
if inplace:
m -= torch.mean(m, dim=1, keepdim=True)
else :
m = m - torch.mean(m, dim=1, keepdim=True)
mt = m.t() # if complex: mt = m.t().conj()
return fact * m.matmul(mt).squeeze()
```

You can also calculate the gradient as follows.

```
def gradient(y, x, grad_outputs=None):
"""Compute dy/dx @ grad_outputs"""
if grad_outputs is None:
grad_outputs = torch.ones_like(y)
grad = torch.autograd.grad(y, [x], grad_outputs=grad_outputs, create_graph=True)[0]
return grad
```

Now let’s take an example.

```
x, y = [90, 60, 90], [90, 90, 30]
A = torch.tensor([x, y], dtype = torch.double, requires_grad=True)
c = cov(m = A, rowvar = True)
```

tensor([[ 300., -300.], [-300., 1200.]], dtype=torch.float64, grad_fn=<MulBackward0>)

You can now calculate the gradient of `c`

w.r.t to `A`

, like this.

```
gradient(y = c, x = A)
```

tensor([[ 30., 0., -30.], [ 30., 0., -30.]], dtype=torch.float64, grad_fn=<AddBackward0>)

But I guess this is not the behaviour you want, because `torch.autograd.grad`

makes dimension reductions for multi-dimensional parameters.

So you can calculate `dc(xi, yj)/dx or dc(xi, yj)/dy`

like this :

```
grad_cij = gradient(y = c[i][j], x = A)
dc(xi, yj)/dx = grad_cij[0]
dc(xi, yj)/dy = grad_cij[1]
```

```
i = 0
j = 0
dcij_dA = gradient(y = c[i][j], x = A)
```

tensor([[ 10., -20., 10.], [ 0., 0., 0.]], dtype=torch.float64, grad_fn=)

Since `c[0][0]`

is independent of `y`

, `dcij_dA[1]`

is null. Similarly, you can see that the derivative of `c[1][1]`

with respect to `x`

is also null.

```
i = 1
j = 1
dcij_dA = gradient(y = c[i][j], x = A)
```

tensor([[ 0., 0., 0.], [ 20., 20., -40.]], dtype=torch.float64, grad_fn=<AddBackward0>)

You can also notice that `c[1][0] = c[0][1]`

, all having non-zero derivatives with respect to `x`

and `y`

.

```
i = 0
j = 1
dcij_dA = gradient(y = c[i][j], x = A)
```

tensor([[ 10., 10., -20.], [ 5., -10., 5.]], dtype=torch.float64, grad_fn=<AddBackward0>)

```
i = 1
j = 0
dcij_dA = gradient(y = c[i][j], x = A)
dcij_dA
```

tensor([[ 10., 10., -20.], [ 5., -10., 5.]], dtype=torch.float64, grad_fn=<AddBackward0>)

You can also calculate everything like this :

```
n, m = c.shape
dc_dA = torch.stack([gradient(y = c[i][j], x = A) for j in range(m) for i in range(n)])
```