Torch.stack is very slow and cause CUDA error

holiv · November 2, 2018, 1:17am

Hello Dear, i want to use the output of my rnn as a tensor to compute. But when i change the list of tensor to one tensor using torch.stack, it becomes too slow and consume a lot memory. Is there anyway i can optimize this loop? thx for your reply.

output = []
hx_0=Variable(torch.randn(1000, 4))
for i in range(1000):
hx= rnn(input[:,i,:], (hx_0))
output.append(hx)
outs1 = torch.stack(output,1)
hx_0=operation(outs1)

print(output )

ptrblck · November 2, 2018, 9:01pm

You could pre-allocate output as an empty tensor and fill it using the index.
Something like this should work:

output = torch.empty(hx.size(0), 1000)
for i in range(1000):
    output[:, i] = hx

I’m not sure, how hx is shaped in your case, so probably you would need to adapt your code a bit.

holiv · November 3, 2018, 12:12am

Thank you @ptrblck for your answer. However, my operation requires a tensor of shape Batch x i x Hidden as input and produces Batch x Hidden. The reason is that i want to compute the new hx_0 as a function of hns stored in out. The shape each step is provided below. Thank you.

rnn = nn.GRUCell(50, 4) #Embedding_dim x Hidden
input = Variable(torch.randn(1000, 32, 50)) # Sequence_leng x Batch x Embedding_dim
input= input.transpose(1, 0)
hx_0 = Variable(torch.randn(32, 4)) #Batch x Hidden

out=[ ]
start=time.time()
for i in range(1000):
hn= rnn(input[:,i,:], (hx_0))
#print(“hx size”,hx.size()) #Batch x Hidden
out.append(hn.view([input.size(0),1,4]))

#outs1 = torch.stack(output,1)

#print("output of stack size",outs1.size()) #Batch x i x Hidden
hx_0=operation(4)(out)
print("output size after operation",hx_0.size())# Batch X Hidden

output = out[-1]
print(“outs after gru”,output.size()) #Batch X Hidden
#print(“Time is:”,time.time()-start)

In addition, if i try your suggestion output[:, i] = hx the error is thrown

expand(torch.FloatTensor{[32, 4]}, size=[32]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

ptrblck · November 3, 2018, 10:42am

Thanks for the info regarding the shapes!
Could you try the following code and see, if it’s working:

batch_size = 32
hidden= 4
nb_samples = 1000

output = torch.zeros(batch_size, nb_samples, hidden)
for i in range(nb_samples):
    output[:, i:i+1, :] = torch.randn(batch_size, 1, hidden)

holiv · December 11, 2018, 9:18am

Thank for your help.