It does seem like a bit of a problem! Is there anything else that comes to mind or am I out of luck?
Also, I was wondering if I could ask another question with some errors I get? For some reason I seem to get an issue with loading my model (occasionally).
Traceback (most recent call last):
File "~/main.py", line 145, in <module>
state_dict = torch.load(f=model_path_pt, map_location=torch.device(device))
File "~/.local/lib/python3.6/site-packages/torch/serialization.py", line 595, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "~/.local/lib/python3.6/site-packages/torch/serialization.py", line 764, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input
Traceback (most recent call last):
File "~/main.py", line 145, in <module>
state_dict = torch.load(f=model_path_pt, map_location=torch.device(device))
File "~/.local/lib/python3.6/site-packages/torch/serialization.py", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "~/.local/lib/python3.6/site-packages/torch/serialization.py", line 853, in _load
result = unpickler.load()
File "~/.local/lib/python3.6/site-packages/torch/serialization.py", line 845, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File "~/.local/lib/python3.6/site-packages/torch/serialization.py", line 833, in load_tensor
storage = zip_file.get_storage_from_record(name, size, dtype).storage()
RuntimeError: [enforce fail at inline_container.cc:145] . PytorchStreamReader failed reading file data/67511648: file read failed
For the first model it seems that the file is just 0 mb in size, is that correct? I only say this from reading this thread on stackoverflow here. For the second one, I’m not 100% sure what’s wrong. I did read you’re previous answer here but I’m saving everything within a dictionary rather than saving the model directly like this…
torch.save({'epoch':preepoch,
'model_state_dict':net.state_dict(),
'optim_state_dict':optim.state_dict(),
'loss':mean_preloss,
'chains':sampler.chains}, model_path_pt)
and then loaded with
state_dict = torch.load(f=model_path_pt, map_location=torch.device(device))
start=state_dict['epoch']+1
net.load_state_dict(state_dict['model_state_dict'])
optim.load_state_dict(state_dict['optim_state_dict'])
loss = state_dict['loss']
sampler.chains = state_dict['chains']
Thank you!
Edit: A follow up question to the PytorchStreamReader
error, I save my model each epoch and each epoch takes around 0.3s to do. Is it advisable to save at each epoch or to save every n-th epoch?. Could this be causing the issue with reading a file each 0.3s? Because the error does vary a bit sometimes it’s failed finding central directory
, invalid header or archive is corrupted
, or file read failed
!