i have some experiments on following toy code, but i have some question about this, can anyone help me ? thank you very much.
#dtype = tc.FloatTensor
dtype = tc.cuda.FloatTensor
x = tc.randn(100000, 50).type(dtype)
w1 = tc.randn(50, 30).type(dtype)
s = time.time()
l = x.mm(w1)
e = time.time()
print e - s
s = time.time()
l = []
for i in range(x.size(0)):
l.append(x[i].unsqueeze(0).mm(w1).squeeze(0))
l = tc.stack(l, dim=0)
e = time.time()
print e - s
if i run this on CPU, the consuming time are: 0.017 sec and 0.696 sec
however, when i run this on GPU, strange things happen, the consuming time became more : 0.314 sec and 1.592 sec.
i think calculation on GPU should take less time , right ?
besides, for loop
takes more time than matrix multiplication, which is reasonable.
someone people explain this for me ? thank you.