Does torch provide softmax and matmul with different shape for each samples?

HI, I am working on a dynamic model,
for each sample, the input has different length, and i do cross attention and get attention map with different shape along with samples.

for example, in a batch we have attention map shaped (4X3) for the first sample but (4X2) for the second one.
and matmul following is like (4X3)X(3Xd) and (4X2)X(2Xd), different along with samples.

is there convenient way to do softmax and mat mul on different length in parallel?, without applying padding for having the same shape or implementing ‘for loop’ along each sample in a batch?


Hi Jinyoung!

No, pytorch does not support “ragged tensors,” that is, tensors whose
slices are of differing shapes.

Either padding (the slices of) your tensors to have the same shapes,
or looping over tensors of differing shapes would be the most practical


K. Frank

1 Like