Different Input Shapes for Triplet Loss

I’m using triplet loss on a siamese network where each half receives a different sized input (128 and 512), how would you recommend that I compute the loss? Would expanding the 128 by copying it 4 times mess with autograd? For now I am just padding the input but I was wondering if there was a more efficient way


From the autograd point of view, anything that you do (as long as it is differentiable) will be fine.

It then depends on your application what you want to do.
I guess the 3 options I can see are interpolate to scale up (or down) one to fit the size of the other. pad or crop. And replicate as you mentioned.
I think it will depend a lot on your application which one will work best so you want to test them and see how it goes.

I’m sorry if I wasn’t clear enough. I’m using an encoder/decoder and I’m performing triplet loss on the output of the encoder. The anchor is differently sized than the positive and negative. Currently I am padding the anchor before it’s encoded, but that wastes memory and I’m very tight on my gpu memory. So I’m wondering if I can pad the encoder output or copy the existing 128 tokens 4 times to create a 512 input for the loss. So will replication or padding of the encoder output make autograd unable to accurately update weights in my encoder?

Both replication and padding are differentiable so no issue on the autograd side :slight_smile:

1 Like

Awesome, thank you so much