# Need help on identifying the proper operations for 2 sequential tensor

Hi,

I have a tensor X , which is of shape:

`batch_size X number_of_sequence X hidden_size`

ex: 10 X 253 X 768 … This tensor X is from the output of an LSTM.

There is a another tensor Y, of shape:

`batch_size X number_of_sequence X embedding_size`

ex: 10 X 253 X 300 … This tensor Y is the output from an Embedding.

I need to work with these 2 tensors X and Y and feed this to an Attention network to match the sequence Y to each element in X. I need bit help as to which operations would be better to pack X and Y … I mean, if I do `torch.cat((X, Y), dim=2)` will this be a good idea?

To match Y to each element in X, they both should be having same dimension. Don’t you think so? i.e., either 768 or 300.

The dimensions of these 2 tensors X and Y are different and for my case, I want the dimension of X. So, what I can do is put Y into an LSTM and get this as the same dimension as of X.

Then I can do a point-wise addition between X and Y to get an unified-weighted tensor.

Can you share will point-wise addition would be a good operation in this case? or there are other better operations?

I have different types of attention mechanisms. (one-directional, bi-directional etc)

similar to

Mostly, they create an affinity matrix with M x N (M = sequence length of X, N = sequence length of Y) by a dot (vector) product between X & Y and use this affinity matrix to summarise X and/or Y.

Hy there!

I am bit confused. How would you do a dot product between 2 different dimensions tensors.

can you please share some code with me?

for example

`x = torch.randn(2, 3, 10) # batch_size X number_of_sequence X hidden_size`
`y = torch.randn(2, 3, 5) # batch_size X number_of_sequence X embedding_size`

and I need the output to be also of shape `(2, 3, 10)`.

How can I do this?

Please let me elaborate a bit more. I am working on a question-answering task. Instead of CNN, the input is from a word embedding.

So, I have 2 tensors.

1. An output from an LSTM (in the figure attached, this is the output from LSTM in `First attention` to x2_1) that holds the representations of:

1.1 the input words (X_1) and other features.
1.2 the question embedding on the input words.

Let’s say this tensor is `hidden_docs`. The shape of this tensor is batch_size X sequence_length X hidden_size. For example, hiddens_docs = tensor(10, 253, 768)

1. X_1 is the input words representation.

Let’s say this tensor is `x1_emb`. The shape of this tensor is batch_size X sequence_length X embedding_size. For example, x1_emb = tensor(10, 253, 300)

I need to project `x1_emb` on hidden_docs to get a unified representation between `hidden_docs` and `x1_emb`. And this representation then needs to be feed into x2_1 (in the ‘Attention again’).

Currently, what I am doing is since the shape is different, I am passing the `x1_emb` into an LSTM and able to get the shape as = batch_size X sequence_length X hidden_size.

And then I am doing a point-wise addition operation between `hidden_docs` and `x1_emb` before passing the output to x2_1. Now here I am bit confused.

Is the operation - point-wise addition between these two tensors is right operations? are there other better operations to do to capture the better representations between `hidden_docs` and `x1_emb`?

I understand. I’m not sure if I could say what operations are right and wrong. In general, I have seen papers doing addition as well as dot product based attention mechanism. You can only evaluate it empirically I guess.