Slow torch.stack()


I would like help optimizing the following line of code, please. We are training a RNN and when profiling our code we find that our bottleneck is this line by far:

signal_seq = torch.stack([self.full_signal[idx+i:idx+i+64] for i in reversed(range(0,-120, -6))], axis=0)

full_signal is a 1D cuda.FloatTensor, and idx is a positive integer.

This line is in the get_item() method of our Dataset, and it seems to take 16s out of 18s in our tests. When executing torch.utils.bottleneck, cProfile says that all this time is consumed in the tensor() method apparently.

Is there an efficient way of doing this, please?

Thank you in advance.


full_signal[idx-114:idx+64].unfold(0, 64, 6)

Hello Eta_C, this is much faster indeed, thank you!

Turns out that our actual bottleneck was not here though, it seems to be in the backward pass, which we didn’t really see in our profilers, I guess because of asynchronous operations, so it’s all good.