Here is an implementation of the covariance with pytorch from covariance-and-gradient-support
import torch
def cov(m, rowvar=False, inplace=False):
'''Estimate a covariance matrix given data.
Thanks :
- https://discuss.pytorch.org/t/covariance-and-gradient-support/16217/2
- https://github.com/numpy/numpy/blob/master/numpy/lib/function_base.py#L2276-L2494
Covariance indicates the level to which two variables vary together.
If we examine N-dimensional samples, `X = [x_1, x_2, ... x_N]^T`,
then the covariance matrix element `C_{ij}` is the covariance of
`x_i` and `x_j`. The element `C_{ii}` is the variance of `x_i`.
Args:
m: A 1-D or 2-D array containing multiple variables and observations.
Each row of `m` represents a variable, and each column a single
observation of all those variables.
rowvar: If `rowvar` is True, then each row represents a
variable, with observations in the columns. Otherwise, the
relationship is transposed: each column represents a variable,
while the rows contain observations.
Returns:
The covariance matrix of the variables.
'''
if m.dim() > 2:
raise ValueError('m has more than 2 dimensions')
if m.dim() < 2:
m = m.view(1, -1)
if not rowvar and m.size(0) != 1:
m = m.t()
m = m.type(torch.double)
fact = 1.0 / (m.size(1) - 1)
if inplace:
m -= torch.mean(m, dim=1, keepdim=True)
else :
m = m - torch.mean(m, dim=1, keepdim=True)
mt = m.t() # if complex: mt = m.t().conj()
return fact * m.matmul(mt).squeeze()
You can also calculate the gradient as follows.
def gradient(y, x, grad_outputs=None):
"""Compute dy/dx @ grad_outputs"""
if grad_outputs is None:
grad_outputs = torch.ones_like(y)
grad = torch.autograd.grad(y, [x], grad_outputs=grad_outputs, create_graph=True)[0]
return grad
Now let’s take an example.
x, y = [90, 60, 90], [90, 90, 30]
A = torch.tensor([x, y], dtype = torch.double, requires_grad=True)
c = cov(m = A, rowvar = True)
tensor([[ 300., -300.], [-300., 1200.]], dtype=torch.float64, grad_fn=<MulBackward0>)
You can now calculate the gradient of c
w.r.t to A
, like this.
gradient(y = c, x = A)
tensor([[ 30., 0., -30.], [ 30., 0., -30.]], dtype=torch.float64, grad_fn=<AddBackward0>)
But I guess this is not the behaviour you want, because torch.autograd.grad
makes dimension reductions for multi-dimensional parameters.
So you can calculate dc(xi, yj)/dx or dc(xi, yj)/dy
like this :
grad_cij = gradient(y = c[i][j], x = A)
dc(xi, yj)/dx = grad_cij[0]
dc(xi, yj)/dy = grad_cij[1]
i = 0
j = 0
dcij_dA = gradient(y = c[i][j], x = A)
tensor([[ 10., -20., 10.], [ 0., 0., 0.]], dtype=torch.float64, grad_fn=)
Since c[0][0]
is independent of y
, dcij_dA[1]
is null. Similarly, you can see that the derivative of c[1][1]
with respect to x
is also null.
i = 1
j = 1
dcij_dA = gradient(y = c[i][j], x = A)
tensor([[ 0., 0., 0.], [ 20., 20., -40.]], dtype=torch.float64, grad_fn=<AddBackward0>)
You can also notice that c[1][0] = c[0][1]
, all having non-zero derivatives with respect to x
and y
.
i = 0
j = 1
dcij_dA = gradient(y = c[i][j], x = A)
tensor([[ 10., 10., -20.], [ 5., -10., 5.]], dtype=torch.float64, grad_fn=<AddBackward0>)
i = 1
j = 0
dcij_dA = gradient(y = c[i][j], x = A)
dcij_dA
tensor([[ 10., 10., -20.], [ 5., -10., 5.]], dtype=torch.float64, grad_fn=<AddBackward0>)
You can also calculate everything like this :
n, m = c.shape
dc_dA = torch.stack([gradient(y = c[i][j], x = A) for j in range(m) for i in range(n)])