Doubts about nn.Transformer() module

Hi, I am using nn.Transformer() for the first time, and I have a basic doubt about it. I understand that both the src and tgt parameters have shape (batch_size, sequence_length, model_dim). I am working on a video frame scoring task, where I want to score every frame of a video using a transformer. I am extracting the frame-level features using a pre-trained ResNet module. Hence, for my case, the src has shape (batch_size, sequence_length, 1024) and the tgt has shape (batch_size, sequence_length, 1). Should I use two separate linear layers for src and tgt to transform {1024 to d_model} and {1 to d_model} respectively, or the src and tgt can have different dimensions?