After forwarding through Encoder network encoder_hidden is used as a decoder_hidden. But what should I use as a first decoder_input to Decoder network?
The original tutorial uses a Start Of the Sequence token, but I can’t use it because it is encoded as 0. Probably 0 as a number will give some additional information for decoder.
Apart from that I have no idea what you’re trying to learn here…
You can use any number x, it only as to adhere to the following constraints
0 <= x <= (M-1) with M being size of your vocabulary + number of special tokens. For example, you have a vocabulary of size 10,000 and have 4 special tokens (very common: <SOS>, <EOS>, <PAD>, <UKN> for start of sequence, end of sequence, padding and unknown tokens). So x can be 0, 1, 2, …10,003.
x cannot be taken by a word from you vocabulary. My vocab2idx mapping usually looks like {'<PAD>': 0, <UKN>': 1, '<SOS>': 2, '<EOS>': 3, 'the': 4, 'a': 5, 'is':6, ...}. So my start token would be 2