Best practice for serializing and deserializing large tensors

jinliangwei · June 28, 2018, 3:06am

I am using pickle to save some tensors to files (pickle.dump), and later load them into memory (pickle.load). Some of the tensors are pretty large (the pickle file is about 220MB for one tensor). Dump takes several minutes but loading takes at least 20 minutes. Is pickle the fastest way to do this? Any suggestions to make it run faster? Thanks!

jinliangwei · June 28, 2018, 7:51am

My apologies. Deserialization actually took only a few minutes – there was another bottleneck. But I still wonder if there’s a faster way for doing this than pickle.

tom · June 28, 2018, 8:11am

There is torch.load/save . Some people use numpy-compatible interfaces (e.g. just npz, which should be pretty efficient, or hdf5 which is cross-platform and popular e.g. with keras).

Best regards

Thomas