Based on the docs, the output will have the same shape as the inputs without the dim which is used to compute the similarity.
So if you would like to use the batch dimension and get an output of [2, 20, 20], you could use dim=0.
It should give you an output of shape [1, 20, 20], so that’s strange.
Could you check it again please?
The cosine similarity will be calculated between both tensors in the specified dimension. All other dimensions apparently deal as an additional storage and won’t be used in the calculation.
You can also reshape your input tensors to [batch_size, 2] and will get the same result:
So when dim=1 it compute it along dimension 1 and consider all the channels together basically. That part makes sense. Though the output is still the size of [ 20, 20] not [1, 20, 20]
maybe it is because im using pytorch 0.3.0?
but the second part that i put dim=0 gives me all values equal to 1 and it does not make sense yet:
i looked at the code, but can you please tell me one more time why it gives output of 1?
Well, because this dimension has the shape 1, you are basically measuring the cosine similarity between points instead of vectors.
This also means that the L2-norm in this dimension is just the absolute value, which results in all ones as the result.
EDIT: Yeah, your older version might yield the different shape.
I see…Thank you for the clarification
do you think there is a way to reshape inputs to compute the similarity of each channel (lets say our batchsize is always 1) and get an out of torch.Size([2, 10, 10])?
Maybe there is a way, but let’s first clarify your use case.
I’m not quite sure, what the cosine similarity should calculate in this case.
Assuming we have two tensors with image dimensions [1, 2, 10, 10].
Now let’s say one tensor stores all ones (call it tensor y).
The other consists of two [10, 10] slices, where one channel is also all ones, the other however is a linspace from 0 to 1 (call it tensor x).
We can now see the channels as coordinates of vectors. I.e. each of the 100 pixels has two coordinates.
While y has all [1, 1] vectors, x's pixel have different vectors with the values between [0, 1] and [1, 1].
We assume the cosine similarity output should be between sqrt(2)/2. = 0.7071 and 1..
Let see an example:
x = torch.cat(
(torch.linspace(0, 1, 10)[None, None, :].repeat(1, 10, 1),
torch.ones(1, 10, 10)), 0)
y = torch.ones(2, 10, 10)
print(F.cosine_similarity(x, y, 0))
It seems to be working in this way.
Now let’s talk a bit about your specific use case.
If you expect an output of [2, 10, 10], the similarity should be calculated somehow elementwise.
I’m not sure, how you would like to use the cosine similarity for it.
Could you explain what kind of information is stored in your images, pixels, channels etc.?
Maybe I’m just misunderstanding the issue completely.
I totally agree with the first part of your explanation.
Let me explain a little more about why i want to compute it like that.
Here is what i am trying to do:
lets say i have features channel 'x = torch.ones(1,2, 10, 10)and 'y = torch.ones(1,2, 10, 10).
What im trying to do is to weight each feature channel in y based on the similarity of that channel with its corresponding channel in x.
of course i can use the cosine similarity for the whole x and y and just multiply each channel of y with that similarity via mul, but i feel like i should compute the similarity between the feature channels separately. meaning that channel 1 should be weighted with similarity between x[0,0,:,:] and y[0,0,:,:] and channel 2 should be weighted with similarity between x[0,1,:,:] and y[0,1,:,:]
Thanks for the explanation.
In this case, would you want to use the 10x10 pixels as the vector to calculate the cosine similarity?
Each channel would therefore hold a 100-dimensional vector pointing somewhere and you could calculate the similarity between the channels.
@ptrblck: I am using toch.bmm to compute the cosine distance. it will return a maxtrix size of NxN instead of triangle vector in the matrix in the nn.CosineSimilarity. How to use nn.CosineSimilarity to get full cosine matrix as torch.bmm did? This is my code.
I am using cosine similarly to check similarly of sentence embedding. I have 200 texts in each of two sets and i am getting the embedding from a model.
The size of each embedding (200,52,784)
Now when i am using cosine similarly its returning me a tensor size (200,784)
But what i want to return me a single percentage value which represents the total similarity between these two sets. How can I do that?
How did you do what you wanted to do? I have to do a similar thing and the paper I am implementing says that just select the argmax. (n vectors which are the output of cosine similarity. Each of these vectors is an array of tensors.)