# Counterintutive on matrix-vector mulitplication running time

I have a counterintutive result on measuring the running time of following matrix-vector multiplication

first case we do normal matrix-vector multiplication

``````Khh = torch.rand((5,1,10001,10001)).cuda()
uh = torch.rand((5,1,10001)).cuda()

niter = 1000
times = []

for i in range(niter):
torch.cuda.synchronize()
start_time = time.time()
wh = torch.einsum('bcmn,bcn->bcm', Khh, uh) * h
torch.cuda.synchronize()
end_time = time.time()
elapsed = end_time - start_time
times.append(elapsed)

print(sum(times) / niter)
``````

the average time on my machine is 0.003968036651611328

the second case, I down sampled Khh along one axis

``````niter = 1000
times = []

for i in range(niter):
start_time = time.time()
KhH = Khh[...,::2]
uH = uh[...,::2]
wh = torch.einsum('bcmn,bcn->bcm', KhH, uH) * H
end_time = time.time()
elapsed = end_time - start_time
times.append(elapsed)

print(sum(times) / niter)
``````

the average time of the second case is 0.0053208649158477785

The result is confusing, since we do less multiplication and addition.
I donâ€™t know why the second case costs more time compare to the first one

Could you describe why you removed the needed synchronizations in the second example? Also, it would be interesting to profile the actual matrix multiplication without the slicing kernel.

Sorry, I forgot that line, but after adding synchronizations the result is still same.

``````Khh = torch.rand((5,1,10001,10001)).cuda()
uh = torch.rand((5,1,10001)).cuda()

niter = 1000
times = []

for i in range(niter):
torch.cuda.synchronize()
start_time = time.time()
wh = torch.einsum('bcmn,bcn->bcm', Khh, uh) * h
torch.cuda.synchronize()
end_time = time.time()
elapsed = end_time - start_time
times.append(elapsed)

print(sum(times) / niter)

niter = 1000
times = []

for i in range(niter):
torch.cuda.synchronize()
start_time = time.time()
KhH = Khh[...,::2]
uH = uh[...,::2]
wh = torch.einsum('bcmn,bcn->bcm', KhH, uH) * H
torch.cuda.synchronize()
end_time = time.time()
elapsed = end_time - start_time
times.append(elapsed)

print(sum(times) / niter)
``````

the result is as following

``````0.0036851165294647216
0.0076900506019592285
``````

Thank you