Hello,
I am applying replicate left-sided i-padding to a tensor X with shape
torch.Size([1024, 5, 10, 50])
to obtain a tensor of size
torch.Size([1024, 5, 10, 55])
using the command
F.pad(X,pad=(i,0,0,0),mode='replicate')
Timing this operation in my notebook with %timeit
yields:
for i in range(1,20):
print(i,end=': ')
%timeit F.pad(X,pad=(i,0,0,0),mode='replicate')
1: 2.9 ms ± 89.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
2: 2.92 ms ± 97.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
3: 3.05 ms ± 26.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
4: 3.1 ms ± 36.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
5: 3.1 ms ± 72.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
6: 3.21 ms ± 35.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
7: 3.2 ms ± 89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
8: 3.28 ms ± 111 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9: 3.4 ms ± 62.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10: 3.42 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11: 3.52 ms ± 89 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
12: 3.58 ms ± 96.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
13: 3.63 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
14: 3.75 ms ± 128 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
15: 3.82 ms ± 102 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
16: 3.82 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
17: 3.9 ms ± 62.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
18: 3.94 ms ± 156 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
19: 4.05 ms ± 103 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
This seems to be quite slow, if it needs to be done over and over again (also that it slows down considerably with growing size of the padding is a bit surprising). Is there a way to do it faster? Else, I would try to apply the padding to X while it is still a numpy array, although I havent tested whether this would be faster, yet.
Thanks!
Best, JZ