Are PyTorch practitioners actually using nested tensors?

Padding and masking for variable-size tensors can decrease code readability, and padding along large dimensions can hurt performance.

To mitigate this, I’ve been exploring whether I could refactor a model using nested tensors. After a few unsuccessful attempts and considering that development on nested tensors has been deprioritized, I’m going to stop here and look for other refactoring options.

Ragged tensors in TensorFlow have been around for a while and come with many examples. However, the lack of documentation and real use cases for their PyTorch counterpart made me wonder:

  • Do PyTorch practitioners actually use this feature, or do they still mostly rely on padding?
  • If they rely on padding, how do they work around the performance and readability issues that come with it? (eg, batching records with similar shapes, flattening tensors and storing offsets, attaching masks to values with a TensorDict, etc.)

Sorry, not really an answer to your question, but an alternative to padding and masking could be to ensure that sequences in a batch always have the same length; see related post. This is quite easy to implement with a custom `Sampler` class.

1 Like