# Index_select() becomes slow with very long columns

Hi, I just found that calling index_select() on long columns (dim=1) are much slower than doing it on long rows (dim=0). In fact, for large enough matrices, it’s much faster to first transpose the matrix and call index_select() on rows!

I wonder if it’s a known performance issue and if there’s any way to mitigate the problem?

Here’s the code I used to test. (I’m using Python 2.7.12, PyTorch 0.2.0.post1, and GTX 1080). It builds a 256*N matrix (for various N) and rearranges its rows. Then it does the same for columns.

``````import time
import numpy as np
import torch

DIM = 256
idxs = np.random.permutation(DIM)
idxs = torch.LongTensor(idxs).cuda()

def do_test(dim, sz, trans=False):
if dim == 0:
# Rearrange 256 rows, each size 'sz'.
A = torch.cuda.FloatTensor(DIM, sz)
else:
# Rearrange 256 columns, each size 'sz'.
A = torch.cuda.FloatTensor(sz, DIM)
A.uniform_(-1.0, 1.0)

B = A.new(A.shape)

torch.cuda.synchronize()

T0 = time.time()
for step in xrange(10):
if trans:
T = A.t().clone()
T.index_select(dim=1-dim, index=idxs, out=B)
else:
A.index_select(dim=dim, index=idxs, out=B)

torch.cuda.synchronize()
T1 = time.time()
print '  %6d : Elapsed %.3f ms' % (sz, (T1 - T0) / 10 * 1000.0)

sizes = [100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000]

print 'Rearranging rows:'
for sz in sizes: do_test(0, sz)
print 'Rearranging columns:'
for sz in sizes: do_test(1, sz)
print 'Rearranging columns (with transpose):'
for sz in sizes: do_test(1, sz, True)
``````

Result:

``````Rearranging rows:
100 : Elapsed 0.051 ms
200 : Elapsed 0.049 ms
500 : Elapsed 0.056 ms
1000 : Elapsed 0.066 ms
2000 : Elapsed 0.084 ms
5000 : Elapsed 0.142 ms
10000 : Elapsed 0.235 ms
20000 : Elapsed 0.419 ms
50000 : Elapsed 0.999 ms
100000 : Elapsed 1.937 ms

Rearranging columns:
100 : Elapsed 0.048 ms
200 : Elapsed 0.051 ms
500 : Elapsed 0.065 ms
1000 : Elapsed 0.111 ms
2000 : Elapsed 0.298 ms
5000 : Elapsed 1.166 ms
10000 : Elapsed 2.953 ms
20000 : Elapsed 7.699 ms
50000 : Elapsed 22.267 ms
100000 : Elapsed 44.207 ms

Rearranging columns (with transpose):
100 : Elapsed 0.051 ms
200 : Elapsed 0.054 ms
500 : Elapsed 0.065 ms
1000 : Elapsed 0.081 ms
2000 : Elapsed 0.118 ms
5000 : Elapsed 0.236 ms
10000 : Elapsed 0.489 ms
20000 : Elapsed 1.472 ms
50000 : Elapsed 8.336 ms
100000 : Elapsed 14.236 ms
``````

As you can see, index_select(dim=1) is much slower than dim=0: more importantly, it grows faster than O(N): N=100000 is about 400 times slower than N=1000, and first transposing the matrix is about three times faster (for N=100000), even considering the time spent on transposing.

Does anyone know what’s going on here?

1 Like