The transformer decoder layer takes in input and the encoder embeddings. What is the shape of the output?
Unlike LSTM how do we do the beam search using the transformer decoder layer? Since transformer decoder gives the entire prediction at once i.e. output shape is the same as the target shape.