I am working with two dimension input
Batch_size = 23
features = 34
Ex : data = np.random.uniform(-1,1,[batch_size, features ])
I want to apply Multi-head attention on that but Multi-head attention accepts input [ batch_size. x sequence_length x embedding_dim ]
I don’t want to use embeddings before attention to make two dim =>Embedding => three dim
as we do in LSTM.
What are the ways we can feed this input to Multihead attention?
Expanding a new dim in existing data is one of the solutions, should I expand dim at 1st axis (np.expand_dims(data, axis=1)
) or 2nd axis (np.expand_dims(data, axis=2)
)?
Thank you!