Version Problem, how to make PyTorch 1.2 and 1.3 work together

Rizhao_Cai · February 12, 2020, 1:17pm

Hi,

I have some version problems. And need your help. Let discuss it.

I am using the distiller to prune my object detection model.
Then I need to deploy the pruned model on a Jetson TX2 with TensorRT 6.0.

However, for the distiller, it requires the JIT module of PyTorch 1.3.1 to get the graph a module such that the distiller can do the actual pruning (not just putting a zeros mask).

After getting the pruned model, I exported it to an ONNX model with the PyTorch 1.3.1, however, I found that the ONNX parser of TensorRT 6.0 is not compatible with PyTorch 1.3.1, but with PyTorch 1.2.
I have raised this issue on github.

Now I have trouble.
How can I make them work together?

I have tried two ways, but both of them failed.
I created two conda environments, one is with PyTorch 1.3.1 and distiller, called env1.3, another is with PyTorch 1.2.0, called env1.2

Under env1.3, I saved the whole model (not just state_dict) with torch.save and pickle.dump, but when I tried to use torch.load or pickle.load to load the model under env1.2, an error will occury, saying that there is no module called distiller.
Under env1.3, I used torch.jit.trace and get a ScriptModulde, and then I used torch.jit.save to save the model to the disk. However, when I used torch.jit.load under env1.2 to load the jit model, I got

terminate called after throwing an instance of ‘c10::Error’
what(): [enforce fail at inline_container.cc:137] . PytorchStreamReader failed closing reader: file not found
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x47 (0x7f9f1c67ee17 in /home/rizhao/anaconda3/envs/torch/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: caffe2::serialize::PyTorchStreamReader::valid(char const*) + 0x6b (0x7f9f1f60775b in /home/rizhao/anaconda3/envs/torch/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #2: caffe2::serialize::PyTorchStreamReader::~PyTorchStreamReader() + 0x1f (0x7f9f1f6077af in /home/rizhao/anaconda3/envs/torch/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #3: + 0x3c17637 (0x7f9f206e6637 in /home/rizhao/anaconda3/envs/torch/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #4: torch::jit::import_ir_module(std::shared_ptrtorch::jit::script::CompilationUnit, std::string const&, c10::optionalc10::Device, std::unordered_map<std::string, std::string, std::hashstd::string, std::equal_tostd::string, std::allocator<std::pair<std::string const, std::string> > >&) + 0x1d0 (0x7f9f206ed220 in /home/rizhao/anaconda3/envs/torch/lib/python3.6/site-packages/torch/lib/libtorch.so)
frame #5: + 0x4d69dc (0x7f9f66ddf9dc in /home/rizhao/anaconda3/envs/torch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x1d3ef4 (0x7f9f66adcef4 in /home/rizhao/anaconda3/envs/torch/lib/python3.6/site-packages/torch/lib/libtorch_python.so)

frame #22: __libc_start_main + 0xf0 (0x7f9f75830830 in /lib/x86_64-linux-gnu/libc.so.6)

Aborted (core dumped)

Now, I have no idea how to solve it. Any cues will be much appreciated.

driazati · February 13, 2020, 6:39pm

Are you able to upload the binary (maybe to a GitHub repo or something) you are trying to load so we can try to reproduce this issue?

Rizhao_Cai · February 14, 2020, 9:06am

Thanks. Can I just upload the ONNX model exported by PyTorch 1.3 or PyTorch 1.2? Besides, I can also upload the exported JIT module.

driazati · February 18, 2020, 6:41pm

Upload everything you can, at the very least the model you load that causes the error to be thrown