Leak in MKL sgetri

Hi, I’m tracking down a memory leakage issue, and one of the leakage that valgrind could find was in this minimum working example and the stacktrace pasted below.

I’m using pytorch 0.4.1 on ubuntu 18.04, and I experienced the same issue when I tried with master a7eee0a1e.

Is this just a benign problem only wasting 1 megabyte one time? My other leakages are stemming from multi-threaded mkl_lapack_strtri, and I’m trying to tell if these are two separate issue.

import torch

x = torch.randn(256, 256)
y = 0

for i in range(10):
    y += x.inverse().sum()

print(y)

==8901== 1,048,576 bytes in 1 blocks are definitely lost in loss record 2,138 of 2,138
==8901==    at 0x4C2FB0F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8901==    by 0xD2B0441: _INTERNAL_23_______src_kmp_alloc_cpp_fbba096b::bget(kmp_info*, long) (in /home/jongwook/anaconda3/lib/libiomp5.so)
==8901==    by 0xD2B016D: ___kmp_fast_allocate (in /home/jongwook/anaconda3/lib/libiomp5.so)
==8901==    by 0xD34BC56: __kmp_task_alloc (in /home/jongwook/anaconda3/lib/libiomp5.so)
==8901==    by 0xD34BB96: __kmpc_omp_task_alloc (in /home/jongwook/anaconda3/lib/libiomp5.so)
==8901==    by 0x130CDEA9: mkl_lapack_strtri (in /home/jongwook/anaconda3/lib/libmkl_intel_thread.so)
==8901==    by 0xD357A42: __kmp_invoke_microtask (in /home/jongwook/anaconda3/lib/libiomp5.so)
==8901==    by 0xD31ACD9: __kmp_invoke_task_func (in /home/jongwook/anaconda3/lib/libiomp5.so)
==8901==    by 0xD31C5B5: __kmp_fork_call (in /home/jongwook/anaconda3/lib/libiomp5.so)
==8901==    by 0xD2DABAF: __kmpc_fork_call (in /home/jongwook/anaconda3/lib/libiomp5.so)
==8901==    by 0x130CD4A6: mkl_lapack_strtri (in /home/jongwook/anaconda3/lib/libmkl_intel_thread.so)
==8901==    by 0xEBD5023: mkl_lapack_sgetri (in /home/jongwook/anaconda3/lib/libmkl_core.so)
==8901==    by 0x14AFAFB6: SGETRI (in /home/jongwook/anaconda3/lib/libmkl_intel_lp64.so)
==8901==    by 0x22804C48: THFloatLapack_getri (in /home/jongwook/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
==8901==    by 0x227CAC7E: THFloatTensor_getri (in /home/jongwook/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
==8901==    by 0x221A2DE1: at::CPUFloatType::_getri_out(at::Tensor&, at::Tensor const&) const (in /home/jongwook/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
==8901==    by 0x2205FA04: at::native::inverse_out(at::Tensor&, at::Tensor const&) (in /home/jongwook/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
==8901==    by 0x2205FE0B: at::native::inverse(at::Tensor const&) (in /home/jongwook/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
==8901==    by 0x2225B420: at::Type::inverse(at::Tensor const&) const (in /home/jongwook/anaconda3/lib/python3.6/site-packages/torch/lib/libcaffe2.so)
==8901==    by 0x1CE80ED9: torch::autograd::VariableType::inverse(at::Tensor const&) const (VariableType.cpp:24790)
==8901==    by 0x1D115671: inverse (TensorMethods.h:973)
==8901==    by 0x1D115671: dispatch_inverse (python_variable_methods_dispatch.h:1210)
==8901==    by 0x1D115671: torch::autograd::THPVariable_inverse(_object*, _object*) (python_variable_methods.cpp:2937)
==8901==    by 0x4F0805A: _PyCFunction_FastCallDict (methodobject.c:192)
==8901==    by 0x4FA1499: call_function (ceval.c:4830)
==8901==    by 0x4FA571B: _PyEval_EvalFrameDefault (ceval.c:3328)
==8901==    by 0x4FA109D: _PyEval_EvalCodeWithName (ceval.c:4159)
==8901==    by 0x4FA16CC: PyEval_EvalCodeEx (ceval.c:4180)
==8901==    by 0x4FA171A: PyEval_EvalCode (ceval.c:731)
==8901==    by 0x4FDD0A1: run_mod (pythonrun.c:1025)
==8901==    by 0x4FDD0A1: PyRun_FileExFlags (pythonrun.c:978)
==8901==    by 0x4FDD206: PyRun_SimpleFileExFlags (pythonrun.c:420)
==8901==    by 0x4FF96FC: run_file (main.c:340)
==8901==    by 0x4FF96FC: Py_Main (main.c:810)
==8901==    by 0x400BBB: main (python.c:69)