# Underrstanding cosine similarity function in pytorch

I have a little difficulty understanding what happens when we use pytorch cosine similarity function.

considering this example:

``````input1 = torch.abs(torch.randn(1,2,20, 20))
input2 = torch.abs(torch.randn(1,2,20, 20))
cos = nn.CosineSimilarity(dim=1, eps=1e-6)
output = cos(input1, input2)
print(output.size())

torch.Size([20, 20])
``````

I was expecting to get the output of size `2x20x20`, can someone please explain to me why it is not like that??

Moreover, is there a way to compute the CosineSimilarity for each channel separately and get an output of size `2x20x20`??

Thanks 2 Likes

any thought on this?

Based on the docs, the output will have the same shape as the inputs without the dim which is used to compute the similarity.
So if you would like to use the batch dimension and get an output of `[2, 20, 20]`, you could use `dim=0`.

2 Likes

@ptrblck
I see, but why when i define the `dim=1` it gives me `torch.Size([20, 20])`? how does it actually compute it?

Also, the results are kinda weird when `dim=0`, it is just 1, im confused It should give you an output of shape `[1, 20, 20]`, so that’s strange.
Could you check it again please?

The cosine similarity will be calculated between both tensors in the specified dimension. All other dimensions apparently deal as an additional storage and won’t be used in the calculation.
You can also reshape your input tensors to `[batch_size, 2]` and will get the same result:

``````res1 = F.cosine_similarity(input1, input2, 1)

res2 = F.cosine_similarity(
input1.permute(0, 2, 3, 1).view(-1, 2),
input2.permute(0, 2, 3, 1).view(-1, 2),
1).view(1, 20, 20)

print((res1 == res2).all())
``````

You can find the implementation here.

2 Likes

So when `dim=1` it compute it along dimension 1 and consider all the channels together basically. That part makes sense. Though the output is still the size of `[ 20, 20]` not `[1, 20, 20]` maybe it is because im using pytorch 0.3.0? ``````input1 = torch.abs(torch.randn(1,2,10, 10))
input2 = torch.abs(torch.randn(1,2,10, 10))
res1 = F.cosine_similarity(input1, input2, 1)
print(res1)
print(res1.size())
``````

output:

``````
0.9249  0.9581  0.9964  0.9384  1.0000  0.7050  0.7339  0.6200  0.3887  0.8247
0.8397  0.9821  0.8112  0.9300  0.9955  0.9970  0.9599  0.9871  0.9546  0.2274
0.9980  0.9903  0.9990  0.9977  0.5832  0.7850  0.9049  0.8266  0.9084  0.9682
0.9949  0.9895  0.8929  0.8659  0.7442  0.5848  0.9990  0.8466  0.9778  1.0000
0.9786  0.9972  0.5892  0.2555  0.6968  0.7367  0.9168  0.8906  0.8962  0.0922
0.8235  0.5739  0.5015  0.9879  0.5706  0.9696  0.9995  0.7057  0.9877  0.8018
0.8789  0.9820  0.7538  0.9882  0.9999  0.2345  0.7596  0.9877  0.9749  0.9463
0.9243  0.9671  0.7078  0.3916  1.0000  0.9979  0.9256  1.0000  0.9740  0.7148
0.9987  0.9342  0.2270  0.8224  0.9970  0.9744  0.8185  0.9213  0.8891  0.9911
0.9607  0.9490  0.9766  0.9463  0.7205  0.9997  0.9150  0.7641  0.5461  0.7848
[torch.FloatTensor of size 10x10]

torch.Size([10, 10])
``````

but the second part that i put `dim=0` gives me all values equal to 1 and it does not make sense yet:
i looked at the code, but can you please tell me one more time why it gives output of 1?

``````res1 = F.cosine_similarity(input1, input2, 0)
print(res1)
print(res1.size())
``````

output:

``````(0 ,.,.) =
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1

(1 ,.,.) =
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
1   1   1   1   1   1   1   1   1   1
[torch.FloatTensor of size 2x10x10]

torch.Size([2, 10, 10])
``````

Well, because this dimension has the shape 1, you are basically measuring the cosine similarity between points instead of vectors.
This also means that the L2-norm in this dimension is just the absolute value, which results in all ones as the result.

EDIT: Yeah, your older version might yield the different shape.

1 Like

I see…Thank you for the clarification
do you think there is a way to reshape inputs to compute the similarity of each channel (lets say our batchsize is always 1) and get an out of
`torch.Size([2, 10, 10])`?

Maybe there is a way, but let’s first clarify your use case.
I’m not quite sure, what the cosine similarity should calculate in this case.
Assuming we have two tensors with image dimensions `[1, 2, 10, 10]`.
Now let’s say one tensor stores all ones (call it tensor `y`).
The other consists of two `[10, 10]` slices, where one channel is also all ones, the other however is a linspace from 0 to 1 (call it tensor `x`).
We can now see the channels as coordinates of vectors. I.e. each of the 100 pixels has two coordinates.
While `y` has all `[1, 1]` vectors, `x`'s pixel have different vectors with the values between `[0, 1]` and `[1, 1]`.
We assume the cosine similarity output should be between `sqrt(2)/2. = 0.7071` and `1.`.

Let see an example:

``````x = torch.cat(
(torch.linspace(0, 1, 10)[None, None, :].repeat(1, 10, 1),
torch.ones(1, 10, 10)), 0)
y = torch.ones(2, 10, 10)
print(F.cosine_similarity(x, y, 0))
``````

It seems to be working in this way.

If you expect an output of `[2, 10, 10]`, the similarity should be calculated somehow elementwise.
I’m not sure, how you would like to use the cosine similarity for it.
Could you explain what kind of information is stored in your images, pixels, channels etc.?
Maybe I’m just misunderstanding the issue completely.

I totally agree with the first part of your explanation.

Let me explain a little more about why i want to compute it like that.

Here is what i am trying to do:
lets say i have features channel 'x = torch.ones(1,2, 10, 10)`and 'y = torch.ones(1,2, 10, 10)`.
What im trying to do is to weight each feature channel in y based on the similarity of that channel with its corresponding channel in x.
of course i can use the cosine similarity for the whole x and y and just multiply each channel of y with that similarity via `mul`, but i feel like i should compute the similarity between the feature channels separately. meaning that channel 1 should be weighted with similarity between x[0,0,:,:] and y[0,0,:,:] and channel 2 should be weighted with similarity between x[0,1,:,:] and y[0,1,:,:]

please let me know if it is not clear

Thanks for the explanation.
In this case, would you want to use the `10x10` pixels as the vector to calculate the cosine similarity?
Each channel would therefore hold a 100-dimensional vector pointing somewhere and you could calculate the similarity between the channels.

``````a = torch.randn(1, 2, 10, 10)
b = torch.randn(1, 2, 10, 10)
F.cosine_similarity(a.view(1, 2, -1), b.view(1, 2, -1), 2)
> tensor([[-0.0755,  0.0896]])
``````

Now you could use these two values to weight your channels.
Would that make sense?

2 Likes

@ptrblck: I am using toch.bmm to compute the cosine distance. it will return a maxtrix size of NxN instead of triangle vector in the matrix in the `nn.CosineSimilarity`. How to use `nn.CosineSimilarity` to get full cosine matrix as torch.bmm did? This is my code.

``````input1 = torch.randn(2, 4, 4)
input2 = torch.randn(2, 4, 4)
#Using bmm
x_norm = input1 / torch.norm(input1, p=2, dim=1, keepdim=True)
y_norm = input2 / torch.norm(input2, p=2, dim=1, keepdim=True)
cosine_sim = torch.bmm(x_norm.transpose(2,1), y_norm)
print('Using bmm: \n', cosine_sim)
# Pytorch built-in
cos = nn.CosineSimilarity(dim=1, eps=1e-6)
cosine_sim = cos(input1, input2)
print('Using nn: \n', cosine_sim)
``````

The output is

``````Using bmm:
tensor([[[-0.0230,  0.2983,  0.0487,  0.3974],
[-0.5747,  0.5513, -0.6436, -0.1389],
[-0.3876, -0.2107,  0.7093, -0.4929],
[-0.3446, -0.5347,  0.6372, -0.6423]],

[[-0.3842, -0.0349,  0.1621,  0.6400],
[ 0.6776, -0.4812, -0.3169, -0.7976],
[-0.5251, -0.1258,  0.9381, -0.2379],
[-0.1517,  0.7164,  0.8332,  0.1668]]])
Using nn:
tensor([[-0.0230,  0.5513,  0.7093, -0.6423],
[-0.3842, -0.4812,  0.9381,  0.1668]])
``````

Do you know how to use ` nn.CosineSimilarity` and achieve similar result as torch.bmm. I cannot use torch.bmm because of CUDA memory error.

I am using cosine similarly to check similarly of sentence embedding. I have 200 texts in each of two sets and i am getting the embedding from a model.
The size of each embedding (200,52,784)
Now when i am using cosine similarly its returning me a tensor size (200,784)
But what i want to return me a single percentage value which represents the total similarity between these two sets. How can I do that?

How did you do what you wanted to do? I have to do a similar thing and the paper I am implementing says that just select the argmax. (n vectors which are the output of cosine similarity. Each of these vectors is an array of tensors.)

Have you found a way to calculate a similarity of each channel ?