Load Module error

python3.6 cuda8 pytorch(lastest from pytorch.org) ubuntu 16.04

while I am training a model on cuda:2

I save it as save(myModule,“xxxx.model”)

but today when I need to reload it

I find code Exception report below:

THCudaCheck FAIL file=/pytorch/torch/csrc/generic/serialization.cpp line=105 error=2 : out of memory
terminate called after throwing an instance of ‘c10::Error’
what(): owning_ptr == NullType::singleton() || owning_ptr->refcount_.load() > 0 ASSERT FAILED at /pytorch/c10/util/intrusive_ptr.h:341, please report a bug to PyTorch. intrusive_ptr: Can only intrusive_ptr::reclaim() owning pointers that were created using intrusive_ptr::release(). (reclaim at /pytorch/c10/util/intrusive_ptr.h:341)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f3172ebbfe1 in /home/songxuemeng/.local/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7f3172ebbdfa in /home/songxuemeng/.local/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: THStorage_free + 0xca (0x7f30f3460aea in /home/songxuemeng/.local/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
frame #3: + 0x4b62f7 (0x7f316e18b2f7 in /home/songxuemeng/.local/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #4: python3.6() [0x541040]
frame #5: python3.6() [0x572c20]
frame #6: python3.6() [0x4e03e1]
frame #7: python3.6() [0x5a6978]
frame #8: python3.6() [0x5b3278]
frame #9: python3.6() [0x5b324e]
frame #10: python3.6() [0x5b324e]
frame #11: _PyEval_EvalFrameDefault + 0x583e (0x50c5fe in python3.6)
frame #12: python3.6() [0x5058a4]
frame #13: python3.6() [0x5066f0]
frame #14: _PyEval_EvalFrameDefault + 0x4de (0x50729e in python3.6)
frame #15: python3.6() [0x504232]
frame #16: PyEval_EvalCode + 0x23 (0x6022e3 in python3.6)
frame #17: python3.6() [0x647fa2]
frame #18: PyRun_FileExFlags + 0x9a (0x64806a in python3.6)
frame #19: PyRun_SimpleFileExFlags + 0x197 (0x649d97 in python3.6)
frame #20: Py_Main + 0x5c2 (0x63c352 in python3.6)
frame #21: main + 0xe9 (0x4dbcb9 in python3.6)
frame #22: __libc_start_main + 0xf0 (0x7f3179af7830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #23: _start + 0x29 (0x5cb639 in python3.6)

I`m sure that cuda:2 has enough memory to load this model( I had used “nvidia-smi” to check it

2 Likes