But I presume the code is not very straightforward to be changed to make use of GPU computing, given default batch size 1 in the example. I’m wondering any good approach to make the forward (and Veritibi) algorithm to deal with batched input.
I think one way to do it is by computing forward variables at each time step once for multiple tokens in a batch. Suppose batch size 1, we have sequence of length 3: w_11, w_12, w_13. For barch size of 2 we then have
w_11, w_12, w_13
w_21, w_22, w_23
The above code assumes batch size of 1 and already put computations in one iteration. I think we can add one dimension to that, however still need to iterate the time steps.