I find that repeat_interleave can generate different latency on the cpu.
For example
import torch
a = torch.rand(1,1024,256,100)
torch.repeat_interleave(a,4,dim=0)
torch.repeat_interleave(a,4,dim=1)
torch.repeat_interleave(a,4,dim=2)
torch.repeat_interleave(a,4,dim=3)
They cost 0.22s, 0.27s, 0.34s and 2.25s individually.
I realize that it should be related to the allocation on memory. Seems that other functions such as cat
may also have similar problems if we perform over dimensions arbitrary. Is there any methodology to avoid such a huge latency? Thanks.