Problem with torch.load and torch.jit.load in Python 3.7.4 and torch 1.2.0

I am trying to load the pretrained weights from HybridPose for Linemod dataset object ape.

I get this error if I use torch.load:

(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ LD_LIBRARY_PATH=lib/regressor:$LD_LIBRARY_PATH python src/train_core.py --load_dir saved_weights/linemod/ape/checkpoints/0.001/199 --object_name ape
number of model parameters: 12959563
loading checkpoint from saved_weights/linemod/ape/checkpoints/0.001/199
> /home/mona/HP/HybridPose/lib/utils.py(32)load_session()
-> print('Could not restore session properly, check the load_dir')
(Pdb) quit()
Traceback (most recent call last):
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/tarfile.py", line 187, in nti
    n = int(s.strip() or "0", 8)
ValueError: invalid literal for int() with base 8: 's\n_rebui'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/tarfile.py", line 2289, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/tarfile.py", line 1095, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/tarfile.py", line 1037, in frombuf
    chksum = nti(buf[148:156])
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/tarfile.py", line 189, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/serialization.py", line 555, in _load
    return legacy_load(f)
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/serialization.py", line 466, in legacy_load
    with closing(tarfile.open(fileobj=f, mode='r:', format=tarfile.PAX_FORMAT)) as tar, \
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/tarfile.py", line 1591, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/tarfile.py", line 1621, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/tarfile.py", line 1484, in __init__
    self.firstmember = self.next()
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/tarfile.py", line 2301, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mona/HP/HybridPose/lib/utils.py", line 25, in load_session
    model.load_state_dict(torch.load(os.path.join(args.load_dir, 'model.pth')))
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/serialization.py", line 386, in load
    return _load(f, map_location, pickle_module, **pickle_load_args)
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/serialization.py", line 559, in _load
    raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name))
RuntimeError: saved_weights/linemod/ape/checkpoints/0.001/199/model.pth is a zip archive (did you mean to use torch.jit.load()?)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/train_core.py", line 102, in <module>
    model, optimizer, start_epoch = setup_model(args)
  File "src/train_core.py", line 88, in setup_model
    model, optimizer, start_epoch = load_session(model, optimizer, args)
  File "/home/mona/HP/HybridPose/lib/utils.py", line 32, in load_session
    print('Could not restore session properly, check the load_dir')
  File "/home/mona/HP/HybridPose/lib/utils.py", line 32, in load_session
    print('Could not restore session properly, check the load_dir')
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/bdb.py", line 88, in trace_dispatch
    return self.dispatch_line(frame)
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/bdb.py", line 113, in dispatch_line
    if self.quitting: raise BdbQuit
bdb.BdbQuit

I have

(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ ls saved_weights/linemod/ape/checkpoints/0.001/199
total 149M
drwx------ 12 mona mona 4.0K Oct 23 15:23 ..
-rw-------  1 mona mona  50M Oct 23 15:23 model.pth
drwx------  2 mona mona 4.0K Oct 23 15:23 .
-rw-------  1 mona mona  99M Oct 23 15:23 optim.pth

If I use torch.jit.load:

def load_session(model, optim, args):
    try:
        start_epoch = int(args.load_dir.split('/')[-1]) + 1
        # model.load_state_dict(torch.load(os.path.join(args.load_dir, 'model.pth')))
        # optim.load_state_dict(torch.load(os.path.join(args.load_dir, 'optim.pth')))
        model.load_state_dict(torch.jit.load(os.path.join(args.load_dir, 'model.pth')))
        optim.load_state_dict(torch.jit.load(os.path.join(args.load_dir, 'optim.pth')))
        for param_group in optim.param_groups:
            param_group['lr'] = args.lr
        print('Successfully loaded model from {}'.format(args.load_dir))
    except Exception as e:
        pdb.set_trace()
        print('Could not restore session properly, check the load_dir')

    return model, optim, start_epoch

I have:

(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ LD_LIBRARY_PATH=lib/regressor:$LD_LIBRARY_PATH python src/train_core.py --load_dir /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199 --object_name ape
number of model parameters: 12959563
loading checkpoint from /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199
> /home/mona/HP/HybridPose/lib/utils.py(34)load_session()
-> print('Could not restore session properly, check the load_dir')
(Pdb) quit()
Traceback (most recent call last):
  File "/home/mona/HP/HybridPose/lib/utils.py", line 27, in load_session
    model.load_state_dict(torch.jit.load(os.path.join(args.load_dir, 'model.pth')))
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/jit/__init__.py", line 162, in load
    cpp_module = torch._C.import_ir_module(cu, f, map_location, _extra_files)
RuntimeError: version_number <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /tmp/pip-req-build-58y_cjjl/caffe2/serialize/inline_container.cc:131, please report a bug to PyTorch. Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. (init at /tmp/pip-req-build-58y_cjjl/caffe2/serialize/inline_container.cc:131)
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x6d (0x7fb30d0091cd in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::init() + 0x246d (0x7fb2defafe9d in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x69 (0x7fb2defb1359 in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #3: torch::jit::import_ir_module(std::shared_ptr<torch::jit::script::CompilationUnit>, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x4d (0x7fb2e0119ddd in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch.so)
frame #4: <unknown function> + 0x51031b (0x7fb2ff51031b in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: <unknown function> + 0x1c7126 (0x7fb2ff1c7126 in /home/mona/anaconda3/envs/hp/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #24: <unknown function> + 0x29d90 (0x7fb30f629d90 in /lib/x86_64-linux-gnu/libc.so.6)
frame #25: __libc_start_main + 0x80 (0x7fb30f629e40 in /lib/x86_64-linux-gnu/libc.so.6)


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "src/train_core.py", line 102, in <module>
    model, optimizer, start_epoch = setup_model(args)
  File "src/train_core.py", line 88, in setup_model
    model, optimizer, start_epoch = load_session(model, optimizer, args)
  File "/home/mona/HP/HybridPose/lib/utils.py", line 34, in load_session
    print('Could not restore session properly, check the load_dir')
  File "/home/mona/HP/HybridPose/lib/utils.py", line 34, in load_session
    print('Could not restore session properly, check the load_dir')
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/bdb.py", line 88, in trace_dispatch
    return self.dispatch_line(frame)
  File "/home/mona/anaconda3/envs/hp/lib/python3.7/bdb.py", line 113, in dispatch_line
    if self.quitting: raise BdbQuit
bdb.BdbQuit
(hp) mona@mona-ThinkStation-P7:~/HP/HybridPose$ ls /home/mona/HP/HybridPose/saved_weights/linemod/ape/checkpoints/0.001/199
total 149M
drwx------ 12 mona mona 4.0K Oct 23 15:23 ..
-rw-------  1 mona mona  50M Oct 23 15:23 model.pth
drwx------  2 mona mona 4.0K Oct 23 15:23 .
-rw-------  1 mona mona  99M Oct 23 15:23 optim.pth

Also, here’s the link to the repository:

This is very strange since I waited until the entire zip file is download and then extracted it. But author told me to check md5sum and realized it was corrupted. Re-downloaded it and used the torch.load and the problem was resolved.

Attempted to read a PyTorch file with version 3, but the maximum supported version for reading is 1. Your PyTorch installation may be too old. 

You might need to update PyTorch to a newer release since 1.2.0 is quite old by now.

That’s strange, as torch.jit.load indicates it’s able to read the file, but it’s PyTorch version is just too old.

1 Like