Different latency to use repeat_interleave on CPU

I find that repeat_interleave can generate different latency on the cpu.
For example

import torch
a = torch.rand(1,1024,256,100)
torch.repeat_interleave(a,4,dim=0)
torch.repeat_interleave(a,4,dim=1)
torch.repeat_interleave(a,4,dim=2)
torch.repeat_interleave(a,4,dim=3)

They cost 0.22s, 0.27s, 0.34s and 2.25s individually.
I realize that it should be related to the allocation on memory. Seems that other functions such as cat may also have similar problems if we perform over dimensions arbitrary. Is there any methodology to avoid such a huge latency? Thanks.