Using a attention layer instead of cat to combine tensors

r00bi · July 22, 2023, 10:45pm

Hello,
I have a naive question:
Since, I have seen in several papers that they used attention or cross attention layer to combine the output of different encoders, my question is that how I can use an attention layer in Pytorch to combine the two or three tensors instead of concatenating them?
Let’s say I have three embedded tensors t1, t2, and t3 with the shapes of (128, 100, 512), (128, 32, 512), and (128, 16, 512), I used a cat layer to combine:
torch.cat([t1,t2, t3], dim=1)
How can I use a attention layer instead of cat to combine these three tensors?
Thanks!