Conversational chatbot

so, imagine i have a dataset containing user conversations(user1, user2), and say i want a transformer model to be able to generate a user2 response given user1 as input, but plus the previous history of the conversations so that the model can generate a plausible response.
example, i have a dataset of one example,

conversations = [’’,
‘hello there how are you ?’,
‘i am doing well . how are you ?’,
‘i am great thanks , do like boating ?’,
‘i like it when i can get away from my job at the grocery store’,
‘its my favorite activity outside of being a doctor . do you like beaches ?’,
‘i give deep sea fishing tours at the beach sometimes’,
‘i volunteer at a farm , do you like animals ?’,
‘i am vegan so i love animals . same here with the farm volunteering’,
‘i too am a vegan , how long for you ?’]
one approach is to extract speaker2 response as the target and speaker1 response with its previous information(none if no previous info) as the input to the encoder network
now, prev_history_with_user1 of each target output will be of dimension (n_conversation,n_history_with_speaker1)

assume for first response,

sample_question_with_history = [’ hi , what is your name ?’]
sample_output = [‘hello . i am jake . nice to meet you .’]

for second response,
sample_example = [’ hi , what is your name ? hello . i am jake . nice to meet you . nice to meet you too ! i am vladimir .’]
sample_output = [‘so do you have a profession ? in an orchestra i am the violinist .’]

and so on up untill the current conversation finished.
now imagine i want to train the transformer with lots of examples, how will you arrange the training data?, since the training data will be of shape
(n_examples,n_conversation,n_history_with_speaker1) will concatenating them so that it will be of shape (n_examples * n_conversations,n_history_with_speaker1) helps?. thank you for your considerate answer