I have a big 1D tensor A
, which contains around 20M elements. I also have some spans with unequal lengths, i.e., B=[(s_1, e_1), (s_2, e_2), ..., (s_n, e_n)]
, where n
may be more than 8K. The one-time slicing A[s:e]
is very fast, but slicing for all spans in B
by for loop
is very time consuming. Is there any way to slice parallelly on gpu? My torch version is 1.8.1, and some operations like map_()
and apply_()
are only available on CPU.
For example:
A = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
B = torch.tensor([[0, 3], [6, 8]])
C = UnkownFuction(A, B)
C
is also a 1D tensor tensor [1, 2, 3, 4, 7, 8, 9]
Thanks for your kind help in advance!