Padding and masking for variable-size tensors can decrease code readability, and padding along large dimensions can hurt performance.
To mitigate this, I’ve been exploring whether I could refactor a model using nested tensors. After a few unsuccessful attempts and considering that development on nested tensors has been deprioritized, I’m going to stop here and look for other refactoring options.
Ragged tensors in TensorFlow have been around for a while and come with many examples. However, the lack of documentation and real use cases for their PyTorch counterpart made me wonder:
- Do PyTorch practitioners actually use this feature, or do they still mostly rely on padding?
- If they rely on padding, how do they work around the performance and readability issues that come with it? (eg, batching records with similar shapes, flattening tensors and storing offsets, attaching masks to values with a TensorDict, etc.)