Thanks for the explanation.
In this case, would you want to use the 10x10
pixels as the vector to calculate the cosine similarity?
Each channel would therefore hold a 100-dimensional vector pointing somewhere and you could calculate the similarity between the channels.
a = torch.randn(1, 2, 10, 10)
b = torch.randn(1, 2, 10, 10)
F.cosine_similarity(a.view(1, 2, -1), b.view(1, 2, -1), 2)
> tensor([[-0.0755, 0.0896]])
Now you could use these two values to weight your channels.
Would that make sense?