What is the best approach to save a tensor with the "actual size"?

Fei_Liu · February 12, 2023, 6:27pm

Say I have a original large tensor as:

x_large = torch.tensor(100_000, 100_000)

Now I do:

x_small = x[::1000, ::1000]

and I only want to save this small tensor to disk. If I directly do torch_save(x_small, <file>) , it will save the full big data chunk along with x_small’s super wide stride (as how torch.save is designed with storage sharing). This costs huge disk space that I don’t need in this particular case. The way to get around this is to do save x_small.contiguous() instead – I get the contiguous format that is often desired, besides the size on disk is compact. In addition, if a tensor is already contiguous then this additional .contiguous() call barely has any cost. So, for most “usual” use cases, I just do .contiguous() before saving as a good habit.

However, this does not solve all the cases. Consider again I have the x_large above, and now my x_small2 is just:

x_small2 = x_large[0]

In this case, x_small2 is already contiguous, but the underlying storage still points to the big chunk. If I want to save the “actual” x_small2, I’ll have to do:

torch.save(x_small2.contiguous().clone(), <file>)

Now to generalize this “good practice”, I’ll have clone all the time before I save the tensor. Are there any less tedious / more recommended ways to accomplish this save task?

Thank you!