HI, I am working on a dynamic model,
for each sample, the input has different length, and i do cross attention and get attention map with different shape along with samples.
for example, in a batch we have attention map shaped (4X3) for the first sample but (4X2) for the second one.
and matmul following is like (4X3)X(3Xd) and (4X2)X(2Xd), different along with samples.
is there convenient way to do softmax and mat mul on different length in parallel?, without applying padding for having the same shape or implementing âfor loopâ along each sample in a batch?
Thanks.