Valgrind is my go-to for wrangling possible memory leaks. It is a beautiful piece of software, but is unfortunately (and necessarily) imperfect. I just ran a libtorch-based application through a relatively brief optimization of a CNN model, and it generated a fair number of loss records. Fortunately, all of them appear to be of the “possibly lost” variety (as opposed to “definitely lost”); I was using all of the available leak-check-heuristics available to valgrind. Many of these records reflect allocations embedded in pthread-related activities, which may just suggest that threads are not being thoroughly cleaned up on exit.

However, there are a number of records which, at least going by the traceback, don’t reflect an allocation related to thread creation. I just quote the largest of these here, and I’ve cut out a lot of intermediate traceback. (Apologies for the messy output.)

==4165== 1,441,792 bytes in 1 blocks are possibly lost in loss record 13,570 of 13,578

==4165== at 0x4C307EF: operator new(unsigned long) (vg_replace_malloc.c:344)

==4165== by 0x12BE8C16: void std::vector<mkldnn_primitive*, std::allocator<mkldnn_primitive*> >::_M_range_insert<__gnu_cxx::__normal_iterator<mkldnn_primitive* const*, std::vector<mkldnn_primitive*, std::allocator<mkldnn_primitive*> > > >(__gnu_cxx::__normal_iterator<mkldnn_primitive**, std::vector<mkldnn_primitive*, std::allocator<mkldnn_primitive*> > >, __gnu_cxx::__normal_iterator<mkldnn_primitive* const*, std::vector<mkldnn_primitive*, std::allocator<mkldnn_primitive*> > >, __gnu_cxx::* normal_iterator<mkldnn_primitive* const*, std::vector<mkldnn_primitive*, std::allocator<mkldnn_primitive*> > >, std::forward_iterator_tag) (in /usr/lib64/libtorch_cpu.so)*<std::tuple<at::Tensor, at::Tensor, at::Tensor> (

…

…

==4165== by 0x103E262C: c10::detail::wrap_kernel_functor_unboxed<c10::detail::WrapRuntimeKernelFunctor

*)(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, std::array<bool, 3ul>), std::tuple<at::Tensor, at::Tensor, at::Tensor>, c10::guts::typelist::typelist<at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, std::array<bool, 3ul> > >, std::tuple<at::Tensor, at::Tensor, at::Tensor> (at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, std::array<bool, 3ul>)>::call(c10::OperatorKernel*, at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, std::array<bool, 3ul>) (in /usr/lib64/libtorch_cpu.so)

==4165== by 0x11D0AE43: torch::autograd::VariableType::mkldnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, std::array<bool, 3ul>) (in /usr/lib64/libtorch_cpu.so)

Of course, “possibly lost” in valgrind may just mean that the allocation has been “hidden” in some way that the available heuristics can’t penetrate. (In my experience, this kind of hiding does not occur in shared_ptr or similar constructs.) Therefore, I wouldn’t expect any kind of detailed interpretation of this, but I was wondering if libtorch (v. 1.5, specifically) has been subjected to any kind of careful memory-leak vetting, whether by valgrind or some other checker.

Part of my interest is perhaps my just being a neurotic purist, but that has gotten me to where some quite complex applications, involving hundreds of millions of alloc/free pairs, result in no loss records of any kind (which automatically triggers a fist-pump on my part ).

Thanks,

Eric