How to feed two dimension tensor to Multi-head attention?

Thomas_34 · August 18, 2020, 7:27pm

I am working with two dimension input

Batch_size = 23
features = 34

Ex : data = np.random.uniform(-1,1,[batch_size, features ])

I want to apply Multi-head attention on that but Multi-head attention accepts input [ batch_size. x sequence_length x embedding_dim ]

I don’t want to use embeddings before attention to make two dim =>Embedding => three dim as we do in LSTM.

What are the ways we can feed this input to Multihead attention?
Expanding a new dim in existing data is one of the solutions, should I expand dim at 1st axis (np.expand_dims(data, axis=1)) or 2nd axis (np.expand_dims(data, axis=2))?

Thank you!