Say I have a original large tensor as:
x_large = torch.tensor(100_000, 100_000)
Now I do:
x_small = x[::1000, ::1000]
and I only want to save this small tensor to disk. If I directly do torch_save(x_small, <file>)
, it will save the full big data chunk along with x_small
’s super wide stride (as how torch.save is designed with storage sharing). This costs huge disk space that I don’t need in this particular case. The way to get around this is to do save x_small.contiguous()
instead – I get the contiguous format that is often desired, besides the size on disk is compact. In addition, if a tensor is already contiguous then this additional .contiguous()
call barely has any cost. So, for most “usual” use cases, I just do .contiguous()
before saving as a good habit.
However, this does not solve all the cases. Consider again I have the x_large
above, and now my x_small2
is just:
x_small2 = x_large[0]
In this case, x_small2
is already contiguous, but the underlying storage still points to the big chunk. If I want to save the “actual” x_small2
, I’ll have to do:
torch.save(x_small2.contiguous().clone(), <file>)
Now to generalize this “good practice”, I’ll have clone
all the time before I save the tensor. Are there any less tedious / more recommended ways to accomplish this save task?
Thank you!