Segmentation fault when matrix inverted

cagatayyildiz · November 13, 2019, 8:40pm

I’m having a segmentation fault error when I try to invert a 150x150 matrix, as simple as this:

J = torch.randn([20,150])
torch.inverse(J.t()@J)

I didn’t try to find out the exact threshold but for example, 120x120 inversion is fine.

Also, I am having this issue on my macOS Sierra (10.12.6) with Python 3.7.3, Clang 4.0.1 and torch 1.3.0. My Ubuntu machine with Python 3.7.4, GCC 7.3.0 and 1.1.0 has no problem inverting even 3k-3k matrices.

Any suggestions?
Thanks!

albanD · November 13, 2019, 8:47pm

Hi,

I cannot reproduce this locally with python 3.7 on macos.
Do you know where the segfaults happen exactly? (using gdb or similar tools)
Also, do you see the same problem with torch.inverse(J.t().clone()@J)?
Also how did you installed pytorch?

Yaroslav_Bulatov · November 14, 2019, 2:13am

Singular matrices expose bugs in numalg algorithms…

Wondering if your call is going through MKL under the hood. I’ve seen crashes in that library for singular matrices which get fixed by setting OMP_NUM_THREADS=1

Dump utility below could be useful to get repro – save the matrix before the crash. Then you can upload the file and wrap it into easy to run repro like here

def dump(result, fname):
    """Save result to file. Load as np.genfromtxt(fname). """
    result = to_numpy(result)
    if result.shape == ():  # savetxt has problems with scalars
        result = np.expand_dims(result, 0)
    location = fname
    # special handling for integer datatypes
    if (
            result.dtype == np.uint8 or result.dtype == np.int8 or
            result.dtype == np.uint16 or result.dtype == np.int16 or
            result.dtype == np.uint32 or result.dtype == np.int32 or
            result.dtype == np.uint64 or result.dtype == np.int64
    ):
        np.savetxt(location, X=result, fmt="%d", delimiter=',')
    else:
        np.savetxt(location, X=result, delimiter=',')
    print("Dumping to", location)

cagatayyildiz · March 17, 2020, 4:15pm

I’m using Anaconda3 on my computer. I don’t really remember how I installed torch but I anyways uninstalled it. When I re-installed it via pip, the error persisted. I once again uninstalled and installed this time via conda. Now, everything seems fine.