How to do Mini-batch for LSTM-CRF?


Hi, I am new to Pytorch and machine learning as well.

I found this tutorial on LSTM-CRF model very usefull.

But I presume the code is not very straightforward to be changed to make use of GPU computing, given default batch size 1 in the example. I’m wondering any good approach to make the forward (and Veritibi) algorithm to deal with batched input.

Thanks !

(Hugh Perkins) #2

To be fair, he says:


Thanks for the notice. Yeah, I mean the code is readable and very useful to understand what’s going on.

I tried to change forward algorithm like this:

def _forward_score(self, feats):
    def log_sum_exp_2(vecs):
        max_scores, max_ids = torch.max(vecs, 1)
        max_scores_exp = max_scores.expand(vecs.size(0), vecs.size(0))
        return max_scores +  torch.log(torch.sum(torch.exp(vecs - max_scores_exp), 1))

    init_alphas = torch.Tensor(1, self.tag_size).fill_(-10000.)
    init_alphas[0][START_TAG] = 0.
    init_variables = Variable(init_alphas)

    def iter_forward(variables, feature_list):
        if feature_list is None:
            end_variables = variables + self.transitions[STOP_TAG].view(1, -1)
            return log_sum_exp(end_variables)

        head_feat = feature_list[0]
            tail_feats = feature_list[1:]
        except ValueError:
            tail_feats = None

        head_feat_exp = head_feat.view(self.tag_size, 1).expand(self.tag_size, self.tag_size)
        variables_exp = variables.expand(self.tag_size, self.tag_size)
        next_tag_variables_exp = variables_exp + self.transitions + head_feat_exp
        new_forward_variables = log_sum_exp(next_tag_variables_exp).view(1, self.tag_size)

        return iter_forward(new_forward_variables, tail_feats)

    return iter_forward_score(variables=init_variables, feature_list=feats)

But still found this is not a good idea for batch training. Any good suggestion ?

(Lucky) #4

I have the same question is that
How can CRF be minibatch in pytorch?

(Lucky) #5

CRF layer in BiLSTM-CRF


I think one way to do it is by computing forward variables at each time step once for multiple tokens in a batch. Suppose batch size 1, we have sequence of length 3: w_11, w_12, w_13. For barch size of 2 we then have
w_11, w_12, w_13
w_21, w_22, w_23

The above code assumes batch size of 1 and already put computations in one iteration. I think we can add one dimension to that, however still need to iterate the time steps.