Dynamic Programming: Longest Common Subsequence


I am currently wondering how could we make a efficient implementation of LCS problem.
I found a way to find consecutive match (i.e. ngrams match) with tensor operation by comparing and shifting.
With two sequences x (len: n), y (len: m), the matrix:
e = x.eq(y.unsqueeze(1)) # [n x m]

we have: e[i, j] == 1 <=> x[i] == y[j], a n-gram match will be visible as a diagonal of ones.
Thus we can do the following:

# match_1 = [n x m] of {0, 1}
match_1 = x.eq(y.unsqueeze(1))

# match_2 = [(n-1) x (m-1)] matrix of {0, 1}
match_2 = match_1[:-1, :-1] * match_1[1:, 1:]

# etcetc

The LCS problem is more complicated as we allow gaps. It can be implemented using dynamic programing, in O(n x m), but it is not really protable to Pytorch right?
I tried, it’s super slow.

# considering two "sentences" (LongTensors of word indices)
# _ts, _tr of respective length n and m
table = torch.zeros(n+1, m+1)
_ts, _tr = ts, tr
for i in range(1, n+1):
    for j in range(1, m+1):
        if _ts[i-1] == _tr[j-1]:
            _table[i, j] = _table[i-1, j-1] + 1 
            _table[i, j] = max(_table[i-1, j], _table[i, j-1])
 lcs = _table[n][m]

Any idea to make it more efficient?