maybe repeat appears better, I cannot give a strong argument to support one against another, we can take an example to see which one might be better, suppose we have,
a = torch.randn(3, 3); a
tensor([[-1.6339, 0.5842, 0.3604],
[ 0.2360, 0.5282, 1.8202],
[-0.2033, -0.4426, -0.0329]])
suppose we wanted a repeated tensor of shape (2, 3, 3)
so one way would be, we do,
torch.repeat_interleave(a, 2)
and get,
tensor([[[-1.6339, -1.6339, 0.5842],
[ 0.5842, 0.3604, 0.3604],
[ 0.2360, 0.2360, 0.5282]],
[[ 0.5282, 1.8202, 1.8202],
[-0.2033, -0.2033, -0.4426],
[-0.4426, -0.0329, -0.0329]]])
another way would be,
torch.repeat_interleave(a, torch.tensor([2]), dim=0).reshape(2, 3, 3)
we get,
tensor([[[-1.6339, 0.5842, 0.3604],
[-1.6339, 0.5842, 0.3604],
[ 0.2360, 0.5282, 1.8202]],
[[ 0.2360, 0.5282, 1.8202],
[-0.2033, -0.4426, -0.0329],
[-0.2033, -0.4426, -0.0329]]])
or we use repeat,
a.repeat(2, 1).reshape(2, 3, 3)
tensor([[[-1.6339, 0.5842, 0.3604],
[ 0.2360, 0.5282, 1.8202],
[-0.2033, -0.4426, -0.0329]],
[[-1.6339, 0.5842, 0.3604],
[ 0.2360, 0.5282, 1.8202],
[-0.2033, -0.4426, -0.0329]]])
if we were applying something like, MaxPool after this, then in the first two cases, it would consider something like this, (if we had 2x2 window)
[-1.6339, 0.5842]
[-1.6339, 0.5842]
which appears a bit weird comparing same terms.
or if we were applying Dropout, maybe dropping 2x2 blocks at random, then,
entire block of,
[-1.6339, 0.5842]
[-1.6339, 0.5842]
might become zero, and we would not have those values anywhere else, this again appears a bit weird
or if we were applying something like nn.LayerNorm after this,
so, if we have a block which has a lot of same values, like,
a = torch.tensor([[1., 1., 1.], [2., 2., 2.]])
b = nn.LayerNorm(3)
then, b(a) would give
tensor([[0., 0., 0.],
[0., 0., 0.]], grad_fn=<NativeLayerNormBackward>)
maybe we do not want to zero out most of the blocks.
it depends on what we apply after this repeat.