How encoder-decoder transformer translate 2 sentences as an input

Hi everyone! Do you have any idea how encoder-decoder transformer translate 2 sentences as an input ? For instance; I am passing 2 sentences as an input through encoder. “This is a dog. Is that correct answer?” After decoder predict “dog” and “.” then should predict <(eos)> (end of sequence). How encoder-decoder transformer translate (separate) 2 sentences as an input like this? Furthermore, Does encoder-decoder transformer use segment embedding for this situation?