Hi all!
I have a question regarding Softmax.
suppose I have two tensors with a batch size like this
import torch
import torch.nn.functional as F
A = torch.Tensor([[1]
,[2]
,[3]]).float()[None] , # tensor shape >> (1,3,1)
B = torch.Tensor([[5],
[2],
[6]]).float()[None] # tensor shape >> (1,3,1)
outer_product =A.view(A.shape[0], A.shape[1], -1) * B.view(B.shape[0],-1, B.shape[1])
#outer_product:
#tensor([[[ 5., 2., 6.],
# [10., 4., 12.],
# [15., 6., 18.]]])
outpro_SM = F.softmax(out_prod, dim=-1)
# output of outpro_SM is:
#tensor([[[2.6539e-01, 1.3213e-02, 7.2140e-01],
# [1.1917e-01, 2.9539e-04, 8.8054e-01],
# [4.7426e-02, 5.8528e-06, 9.5257e-01]]])
Until this point everything seems to work well, though, I when I change the operation of the outer product and do for example outer addition like this
out_add =A.view(A.shape[0], A.shape[1], -1) + B.view(B.shape[0],-1, B.shape[1])
# output of out_add is:
# tensor([[[6., 3., 7.],
# [7., 4., 8.],
# [8., 5., 9.]]])
outadd_SM = F.softmax(out_add , dim=-1)
# output of outadd_SM is
#tensor([[[0.2654, 0.0132, 0.7214],
# [0.2654, 0.0132, 0.7214],
# [0.2654, 0.0132, 0.7214]]])
I am very confused why the softmax of the out_add gives a matrix of similar numbers for each row. Am I doing somthing wrong here?
Any help is greatly appreciated