Torch.stack is very slow and cause CUDA error

Hello Dear, i want to use the output of my rnn as a tensor to compute. But when i change the list of tensor to one tensor using torch.stack, it becomes too slow and consume a lot memory. Is there anyway i can optimize this loop? thx for your reply.

output = []
hx_0=Variable(torch.randn(1000, 4))
for i in range(1000):
hx= rnn(input[:,i,:], (hx_0))
output.append(hx)
outs1 = torch.stack(output,1)
hx_0=operation(outs1)

print(output )

You could pre-allocate output as an empty tensor and fill it using the index.
Something like this should work:

output = torch.empty(hx.size(0), 1000)
for i in range(1000):
    output[:, i] = hx

I’m not sure, how hx is shaped in your case, so probably you would need to adapt your code a bit.

Thank you @ptrblck for your answer. However, my operation requires a tensor of shape Batch x i x Hidden as input and produces Batch x Hidden. The reason is that i want to compute the new hx_0 as a function of hns stored in out. The shape each step is provided below. Thank you.

rnn = nn.GRUCell(50, 4) #Embedding_dim x Hidden
input = Variable(torch.randn(1000, 32, 50)) # Sequence_leng x Batch x Embedding_dim
input= input.transpose(1, 0)
hx_0 = Variable(torch.randn(32, 4)) #Batch x Hidden

out=[ ]
start=time.time()
for i in range(1000):
hn= rnn(input[:,i,:], (hx_0))
#print(“hx size”,hx.size()) #Batch x Hidden
out.append(hn.view([input.size(0),1,4]))

#outs1 = torch.stack(output,1)

#print("output of stack size",outs1.size()) #Batch x i x Hidden
hx_0=operation(4)(out)
print("output size after operation",hx_0.size())# Batch X Hidden

output = out[-1]
print(“outs after gru”,output.size()) #Batch X Hidden
#print(“Time is:”,time.time()-start)

In addition, if i try your suggestion output[:, i] = hx the error is thrown

expand(torch.FloatTensor{[32, 4]}, size=[32]): the number of sizes provided (1) must be greater or equal to the number of dimensions in the tensor (2)

Thanks for the info regarding the shapes!
Could you try the following code and see, if it’s working:

batch_size = 32
hidden= 4
nb_samples = 1000

output = torch.zeros(batch_size, nb_samples, hidden)
for i in range(nb_samples):
    output[:, i:i+1, :] = torch.randn(batch_size, 1, hidden)

Thank for your help.