I am trying to do batched beam search in seq2seq, my batch size=2 and beam size=2. When it comes out from encoder hidden dimension is 1x2x100 [as i don’t consider beam there]. Now as it has to be fed into the decoder with two initial states for two sentences.Do I need to make it 1x4x100 ?
We would have 2 hidden states for each sentence [as there are two sources per sentence[beam size=2] each giving a new hidden state . Which dimension would pytorch put the 4 hidden states into?
I would need to feed four hidden states [2 sentences;2 beam size] which axis’s dimension should be increased?
Should the hidden state be made 1x4x100 for feeding to decoder?
I saw the documentation for hidden and it said (num_layersxnum_directions, batch_size, input_size), so i think it should be (1,2x2 [batch_sizexbeam_size],100)