Ragged tensors are useful in cases where labels have a varying length of inputs. For example, graph embeddings may vary in how many nodes they are connected to. Another example is Bert: the inputs may be a series of texts, which usually are not uniform in length, so inputs of a particular batch need to be padded to match the maximum sample length.
Both keras and original tensorflow have ragged tensors, I am wondering if there are any plans to introduce these to Pytorch.