Pairwise correlation

Mark_Esteins · May 26, 2022, 9:25am

Hi,

What would be the optimal way to calculate pairwaise correlations of 2 tensors?
I’m trying to replicate this pandas dataframe feature:
u = df[features_list].corrwith(df[‘prediction’])
https://docs.pymars.org/en/latest/reference/dataframe/generated/mars.dataframe.DataFrame.corrwith.html

Need to calculate a pairwise correlations of a matrix with a vector and I would like to do it on gpu (could do it with numpy and then convert to tensor but it’s quite slow).

Thanks in advance.

Andrei_Cristea · May 26, 2022, 12:45pm

I’m not super familiar with the pandas function, but if this description is what you need, you can just do this:

y = 200  # length of vector
x = 1000  # number of such vectors that make up the matrix

# generate some data for illustration
vector = torch.randn(y)
matrix = 0.5 * vector + 0.5 * torch.randn(x, y)

correlation = (matrix * vector).sum(dim=1) / ((matrix * matrix).sum(dim=1) * (vector * vector).sum()).sqrt()
print(correlation.shape)
print(correlation.mean())

Output:
torch.Size([1000])  # one correlation for each vector in the matrix
tensor(0.7248)  # close to sqrt(0.5) which is the expected result

Mark_Esteins · May 26, 2022, 6:43pm

Thanks @Andrei_Cristea. The thing is that I’m having an issue regarding dimensions. I need as the output, the correlation of each column of the dataframe with a specific column. I’m attaching a code example below with the output:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,10,size=(10, 4)), columns=list('ABCD'))
print("DATAFRAME")
print(df)
print("\nCORRELATIONS")
feature_cols = ['A','B','C', 'D']
corrs = df.loc[:, feature_cols].corrwith(df['D'])
print(corrs)

And the output looks like this:

DATAFRAME
   A  B  C  D
0  7  1  1  1
1  9  1  8  5
2  8  3  2  1
3  8  7  4  6
4  9  5  0  0
5  0  7  4  9
6  6  5  4  8
7  5  9  0  3
8  1  0  8  2
9  1  1  7  3

CORRELATIONS
A   -0.300373
B    0.410389
C    0.322430
D    1.000000
dtype: float64

I don’t know how to structure the tensors to get a similar structure as the shown output.

Thanks in advance.

Andrei_Cristea · May 26, 2022, 6:53pm

Just a small change to the above.

y = 10  # number of rows
x = 4  # columns A-D

# generate some data for illustration
matrix = torch.empty(10, 4).random_(10)
mat_dm = matrix - matrix.mean(dim=0)  # demean each column
vec = mat_dm[:, -1]

correlation = (mat_dm.T * vec).sum(dim=1) / ((mat_dm.T * mat_dm.T).sum(dim=1) * (vec * vec).sum()).sqrt()

print("TENSOR")
print(matrix)
print("CORRELATIONS")
print(correlation)

TENSOR
tensor([[3., 3., 2., 2.],
        [2., 7., 2., 9.],
        [5., 5., 4., 4.],
        [2., 0., 5., 3.],
        [7., 4., 2., 5.],
        [2., 9., 1., 6.],
        [8., 1., 2., 9.],
        [7., 9., 2., 4.],
        [9., 5., 4., 4.],
        [9., 9., 3., 7.]])
CORRELATIONS
tensor([ 0.1533,  0.2160, -0.4095,  1.0000])

Mark_Esteins · May 26, 2022, 7:16pm

That’s exactly what I was looking for!

Thanks!