Could for loop for a row in matrix be avoided where there's dependency between rows?

Hello, I have a simple question for you but confused me a lot.
Suppose I got a tensor A in [batch_size, n, m], and I need to get a tenor B from A in the same shape, for B[:, i, :] = f(B[:, i-1, :], A[:, i, :]), the f could be any function that complex enough. The initial state of B, namely B[:, 0, :] could be any, say, all zero vector.

To get B, I need a for loop:
for i in range(A.shape[1]):
B[:, i, :] = f(B[:, i-1, :], A[:, i, :])

However this cost much time espetially when A.shape[1] is very large for the backpropogation of gradient.

For example, one f I meet is: B[:, i, :] = (B[:, i - 1, :] + cat([zeros(batch_size, A.shape[1], 0), B[:, i - 1, : -1]], -1)) * A[:, i, :]

How to accelerate or avoid this for loop?

Thank you for your answer~~~~