Using torch.save and torch.load on very heavy files

yumu · March 25, 2020, 12:07pm

i used torch.save to save two very big tensors one being the training dataset and the other one being the test dataset (separated by torch.utils.data.random_split)

the original tensor was of size (4803, 354, 4000)

when i do that it’s very slow to save (wich is understandable), both files have the same size (wich is weird since the testing set is only 20% of the original while the training set is 80% of the original)
and when i try to load the dataset using torch.load it gives me this error (note : i am on a gpu kernel but i get the same error on a cpu kernel)

RuntimeError Traceback (most recent call last)
in
----> 1 training_set = torch.load(‘trainaction.pt’)
2 testing_set = torch.load(‘testaction.pt’)
3
4 batch_size = 200
5 DATA_PATH = ‘./data’

~.conda\envs\py36\lib\site-packages\torch\serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
527 with _open_zipfile_reader(f) as opened_zipfile:
528 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
–> 529 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
530
531

~.conda\envs\py36\lib\site-packages\torch\serialization.py in _legacy_load(f, map_location, pickle_module, **pickle_load_args)
700 unpickler = pickle_module.Unpickler(f, **pickle_load_args)
701 unpickler.persistent_load = persistent_load
–> 702 result = unpickler.load()
703
704 deserialized_storage_keys = pickle_module.load(f, **pickle_load_args)

~.conda\envs\py36\lib\site-packages\torch\serialization.py in persistent_load(saved_id)
661 location = _maybe_decode_ascii(location)
662 if root_key not in deserialized_objects:
–> 663 obj = data_type(size)
664 obj._torch_load_uninitialized = True
665 deserialized_objects[root_key] = restore_location(obj, location)

RuntimeError: [enforce fail at …\c10\core\CPUAllocator.cpp:47] ((ptrdiff_t)nbytes) >= 0. alloc_cpu() seems to have been called with negative number: 18446744066554005248

richard · March 25, 2020, 3:16pm

This sounds like a bug or some limitation. Either way, it shouldn’t fail with an assertion failure, the framework should be able to provide a nice error message. Could you file an issue at http://github.com/pytorch/pytorch/ ?

yumu · March 25, 2020, 3:33pm

i filed the issue

do you have any idea on how to solve the problem ? also do you think it comes from the save or from the load ? both files having the same weight when they shouldn’t seems suspicious but im not sure