I’m sorry if I wasn’t clear enough. I’m using an encoder/decoder and I’m performing triplet loss on the output of the encoder. The anchor is differently sized than the positive and negative. Currently I am padding the anchor before it’s encoded, but that wastes memory and I’m very tight on my gpu memory. So I’m wondering if I can pad the encoder output or copy the existing 128 tokens 4 times to create a 512 input for the loss. So will replication or padding of the encoder output make autograd unable to accurately update weights in my encoder?