I am using toch.bmm to compute the paired-wise cosine distance between `BxDxN`

and `BxDxN`

. It will return a matrix size of NxN instead of a triangle vector in the matrix in the `nn.CosineSimilarity`

. How to use `nn.CosineSimilarity`

to get full cosine matrix as torch.bmm did? I cannot use `torch.bmm`

because of CUDA memory error. This is my code.

```
input1 = torch.randn(2, 4, 4)
input2 = torch.randn(2, 4, 4)
#Using bmm
x_norm = input1 / torch.norm(input1, p=2, dim=1, keepdim=True)
y_norm = input2 / torch.norm(input2, p=2, dim=1, keepdim=True)
cosine_sim = torch.bmm(x_norm.transpose(2,1), y_norm)
print('Using bmm: \n', cosine_sim)
# Pytorch CosineSimilarity
cos = nn.CosineSimilarity(dim=1, eps=1e-6)
cosine_sim = cos(input1, input2)
print('Using nn.CosineSimilarity: \n', cosine_sim)
```

The output is

```
Using bmm:
tensor([[[-0.0230, 0.2983, 0.0487, 0.3974],
[-0.5747, 0.5513, -0.6436, -0.1389],
[-0.3876, -0.2107, 0.7093, -0.4929],
[-0.3446, -0.5347, 0.6372, -0.6423]],
[[-0.3842, -0.0349, 0.1621, 0.6400],
[ 0.6776, -0.4812, -0.3169, -0.7976],
[-0.5251, -0.1258, 0.9381, -0.2379],
[-0.1517, 0.7164, 0.8332, 0.1668]]])
Using nn:
tensor([[-0.0230, 0.5513, 0.7093, -0.6423],
[-0.3842, -0.4812, 0.9381, 0.1668]])
```