Hi,
I have a 2 stream conv network, one of which is a vision subnet. The output of this subnet is a tensor of dimension [b, 14,14,128] where b is the batch size. The output of the send subnet is a tensor of dim [b, 128]. I need to perform a scalar product to compute the similarity between each of the 14x14 128-dim vector from the visual subnet and the corresponding 128-dim vector from the 2nd subenet, so that the result is a [b,14,14,1] similarity map. I tried using torch.dot() and torch.bmm() but I am getting an error and I am not sure how I should transform the dimensions to get the 14x14x1 similarity map. I would appreciate any guidance in this regard. Here is a portion of my code:
def forward(self, x_v, x_a):
v_out = self.vfeatures(x_v) #[b,14,14,128] feature map
a_out = self.afeatures(x_a) #[b,128]
# sim map between embeddings needs to be a [bx14x14x1] map.
#how can I define the pairwise_sim function to compute the scalar product?
sim_map = pairwise_sim(v_out, a_out)
thank you.