How to do mini batch with dynamic computation graph

shijie-wu · February 10, 2017, 7:52pm

Hi all, I am new to framework with dynamic computation graph. I search everywhere but I couldn’t find a reference about how to implement mini-batch with RNN or even tree LSTM with varying length input. So I guess my general problem is how to do mini batch with dynamic computation graph. Thanks.

jekbradbury · February 10, 2017, 8:13pm

For RNNs, there’s already a batched variable-length example (on the SNLI dataset https://github.com/pytorch/examples/tree/master/snli); there all you have to do is sort the examples a little so that you can batch together sentences of similar lengths with minimal padding.

TreeRNNs are harder. I’ll add an example soon that does this, but the general idea for TreeRNNs is that batching is up to you as the user, and you should split and concatenate when you need to. So if you use a binary tree structure, you can represent it as a shift-reduce parser (see the SPINN paper from Bowman et al) that means you can process multiple trees in parallel by doing preprocessing like this:

input:
 tree1: ((ab)c)
 tree2: (d(ef))
preprocessed input:
        1        2        3        4        5
 tree1: SHIFT(a) SHIFT(b) REDUCE   SHIFT(c) REDUCE
 tree2: SHIFT(d) SHIFT(e) SHIFT(f) REDUCE   REDUCE

and then using advanced indexing to copy all the tokens for SHIFT at each timestep in parallel while concatenating the stack representations for batched REDUCE.

Sorry if this is confusing, I promise an example will be up soon.

I would add that PyTorch is impressively fast on TreeRNNs even without batching.

shijie-wu · February 11, 2017, 3:27am

Thank you for the pointer. So it’s actually up to the user to design batching mechanism. Thank you all for building pytorch with amazing flexibility and great tutorial.

Just curious, is there any plan to release a technical report about the performance of pytorch compared to other framework with support of dynamic computation graph?

apaszke · February 11, 2017, 11:32am

For now I think we just have to say that we’re quite fast We’d rather have someone independently benchmark the frameworks, or do a collaboration where maintainers of each implement the same script. Otherwise the benchmarks can end up being a bit biased, because we don’t know other libraries nearly as well as we do PyTorch.

shijie-wu · February 12, 2017, 5:36pm

That make sense. Thanks for the great work.

rituk · February 19, 2017, 3:08pm

In this particular SNLI example, there is an import call to module torchtext in train.py.
Does that module exist? I can’t find it.

examples/snli/train.py

train.py: from torchtext import data
train.py: from torchtext import datasets

apaszke · February 19, 2017, 4:12pm

@rituk see this repo.

supakjk · February 19, 2017, 5:15pm

Using BucketIterator, which produces minibatches with minimized paddings, would work ok for SNLI since it’s a sentence classification task. However, for sequence tagging tasks, I think having padded inputs (even though # pads is minimized) without gradient masking won’t be a good idea, since we would need to get gradients from the targets for the padded inputs.

jekbradbury · February 20, 2017, 7:23am

Yes, this kind of padding is a stopgap until PyTorch has full masked RNN support, which is on its way.

xuehy · May 10, 2017, 11:40am

What if we are facing arbitrary trees rather than binary ones? This can correspond to the childsum treelstm where each node can have different number of children. Is it still possible to batch with the shift-reduce strategy?

jekbradbury · May 10, 2017, 11:22pm

It’s possible but a lot harder.

xuehy · May 16, 2017, 2:23am

I hear that tensorflow-fold is able to batch trees of arbitrary shapes. Is there similar implementations in pytorch? Why no body is trying to make a tool?

Andre_Holzner · July 6, 2017, 12:29pm

for what it’s worth, I could install torchtext using:

pip install git+https://github.com/pytorch/text.git

(in a virtualenv environment, otherwise try with --user)

Brando_Miranda · July 14, 2017, 10:08pm

whats an example for a feedforward NN or CNN? I try to index my torch arrays of data and it says I can’t/shouldn’t be using numpy to index things. As in:

def get_batch(X,Y,M):
    N = len(Y)
    valid_indices = np.array( range(N) )
    batch_indices = np.random.choice(valid_indices,size=M,replace=False)
    batch_xs = X[batch_indices,:]
    batch_ys = Y[batch_indices]
    return batch_xs, batch_ys

where X and Y are torch tensors (or variables).

Brando_Miranda · July 14, 2017, 11:59pm

I think my code runs now, but it seems there has to be a better way than doing:

def get_batch2(X,Y,M,dtype):
    X,Y = X.data.numpy(), Y.data.numpy()
    N = len(Y)
    valid_indices = np.array( range(N) )
    batch_indices = np.random.choice(valid_indices,size=M,replace=False)
    batch_xs = torch.FloatTensor(X[batch_indices,:]).type(dtype)
    batch_ys = torch.FloatTensor(Y[batch_indices]).type(dtype)
    return Variable(batch_xs, requires_grad=False), Variable(batch_ys, requires_grad=False)

I tried a couple of torch methods like gather and index_select but with no luck. Some of the things I tried:

    #valid_indices = torch.arange(0,N).numpy()
    #valid_indices = np.array( range(N) )
    #batch_indices = np.random.choice(valid_indices,size=M,replace=False)
    #indices = torch.LongTensor(batch_indices)
    #batch_xs, batch_ys = torch.index_select(X_mdl, 0, indices), torch.index_select(y, 0, indices)
    #batch_xs,batch_ys = torch.index_select(X_mdl, 0, indices), torch.index_select(y, 0, indices)

wonder if this silly moving from numpy to torch is actually slowing my code down! I hope not.

(as a side note I put my question on SO: https://stackoverflow.com/questions/45113245/how-to-get-mini-batches-in-pytorch-in-a-clean-and-efficient-way
question with this level of detail are welcome to the pytorch forum or not? Or should one stick with SO?)