# Derivative of covariance matrix

Suppose I have a bivariate covariance function `c(x,y)` and at input `[x1,...,xn]` and `[y1,...,ym]` and I would like to compute the covariance matrix with `i,j`th element `d/dx c(xi,yj)`. Here I have assumed that each `xi` and `yj` are scalar valued, however this could easily be extended to finding the `k`th partial derivative of the covariance matrix. In general, I would like this to be extended to higher derivatives too (e.g. I would like to compute the matrix with `i,j`th element `d/dx d/dy c(xi,yj)`), but this may be an easy generalisation of the first issue.

I feel that PyTorch’s automatic differentiation system should be able to handle this, but I haven’t been able to come up with any nice solution. I think the issue is to do with broadcasting derivatives. For instance, to compute the non-differentiated covariance matrix, I would do something like `c(x.unsqueeze(1),y)`, but the following code doesn’t work as I would like (using `autograd.grad`):

``````X = torch.Tensor([1,2,3]).requires_grad_(True).unsqueeze(1)
cov_matrix = c(X,Y)
``````

Any help would be appreciated! Thanks

Here is an implementation of the covariance with pytorch from covariance-and-gradient-support

``````import torch

def cov(m, rowvar=False, inplace=False):
'''Estimate a covariance matrix given data.

Thanks :
- https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L2276-L2494

Covariance indicates the level to which two variables vary together.
If we examine N-dimensional samples, `X = [x_1, x_2, ... x_N]^T`,
then the covariance matrix element `C_{ij}` is the covariance of
`x_i` and `x_j`. The element `C_{ii}` is the variance of `x_i`.

Args:
m: A 1-D or 2-D array containing multiple variables and observations.
Each row of `m` represents a variable, and each column a single
observation of all those variables.
rowvar: If `rowvar` is True, then each row represents a
variable, with observations in the columns. Otherwise, the
relationship is transposed: each column represents a variable,
while the rows contain observations.

Returns:
The covariance matrix of the variables.
'''
if m.dim() > 2:
raise ValueError('m has more than 2 dimensions')
if m.dim() < 2:
m = m.view(1, -1)
if not rowvar and m.size(0) != 1:
m = m.t()
m = m.type(torch.double)
fact = 1.0 / (m.size(1) - 1)
if inplace:
m -= torch.mean(m, dim=1, keepdim=True)
else :
m = m - torch.mean(m, dim=1, keepdim=True)
mt = m.t()  # if complex: mt = m.t().conj()
return fact * m.matmul(mt).squeeze()
``````

You can also calculate the gradient as follows.

``````def gradient(y, x, grad_outputs=None):
``````

Now let’s take an example.

``````x, y = [90, 60, 90], [90, 90, 30]
A = torch.tensor([x, y], dtype = torch.double, requires_grad=True)
c = cov(m = A, rowvar = True)
``````

tensor([[ 300., -300.], [-300., 1200.]], dtype=torch.float64, grad_fn=<MulBackward0>)

You can now calculate the gradient of `c` w.r.t to `A`, like this.

``````gradient(y = c, x = A)
``````

But I guess this is not the behaviour you want, because `torch.autograd.grad` makes dimension reductions for multi-dimensional parameters.

So you can calculate `dc(xi, yj)/dx or dc(xi, yj)/dy` like this :

``````grad_cij = gradient(y = c[i][j], x = A)

``````
``````i = 0
j = 0
dcij_dA = gradient(y = c[i][j], x = A)
``````

tensor([[ 10., -20., 10.], [ 0., 0., 0.]], dtype=torch.float64, grad_fn=)

Since `c` is independent of `y`, `dcij_dA` is null. Similarly, you can see that the derivative of `c` with respect to `x` is also null.

``````i = 1
j = 1
dcij_dA = gradient(y = c[i][j], x = A)
``````

You can also notice that `c = c`, all having non-zero derivatives with respect to `x` and `y`.

``````i = 0
j = 1
dcij_dA = gradient(y = c[i][j], x = A)
``````

``````i = 1
j = 0
dcij_dA = gradient(y = c[i][j], x = A)
dcij_dA
``````

``````n, m = c.shape