Left / right side padding

I am confused regarding padding side . like how do we decide which side to add padding ?
Also isn’t attention mask available to use then how does it matter ?
and lastly for multimodal data (say img + text input = text output) how to decide padding.