Are PyTorch practitioners actually using nested tensors?

quentin · November 26, 2025, 9:01pm

Padding and masking for variable-size tensors can decrease code readability, and padding along large dimensions can hurt performance.

To mitigate this, I’ve been exploring whether I could refactor a model using nested tensors. After a few unsuccessful attempts and considering that development on nested tensors has been deprioritized, I’m going to stop here and look for other refactoring options.

Ragged tensors in TensorFlow have been around for a while and come with many examples. However, the lack of documentation and real use cases for their PyTorch counterpart made me wonder:

Do PyTorch practitioners actually use this feature, or do they still mostly rely on padding?
If they rely on padding, how do they work around the performance and readability issues that come with it? (eg, batching records with similar shapes, flattening tensors and storing offsets, attaching masks to values with a TensorDict, etc.)

vdw · November 27, 2025, 1:46am

Sorry, not really an answer to your question, but an alternative to padding and masking could be to ensure that sequences in a batch always have the same length; see related post. This is quite easy to implement with a custom `Sampler` class.