Problem reshaping an LSTM output with view() when batch_first=True

miguelvr · March 26, 2017, 12:52am

Hey there,

I’m trying to implement a pytorch module that wraps any module and applies the wrapped module’s operation to every time step of the input. This is pretty much the same as Keras’ TimeDistributed wrapper.

To do so, I want to simply reshape the input to two dimensions, apply the operation, and then reshape it back.

However, I’m having troubles doing x_reshape = x.view(-1, x.size(-1)), when the nn.LSTM cell has batch_first=True, which gets me the error:

RuntimeError: input is not contiguous at /Users/soumith/code/pytorch-builder/wheel/pytorch-src/torch/lib/TH/generic/THTensor.c:231

Hopefully, someone here can help me solve this problem…

Cheers

jekbradbury · March 26, 2017, 1:55am

view only works on tensors which are contiguous in memory, so in general you have to write .contiguous().view(sizes)
Also, the Bottle mixin in the SNLI model script (in the examples repo) may do more or less what you’re looking for, at least for some layers (nn.Bottle was the name for this in Torch7)

ecolss · May 7, 2017, 8:37am

@jekbradbury

So indeed, in order to avoid extra .contiguous() call, it’s strongly recommended to go with batch_first=False?
Cuz I think this function call would invoke memory copy to make contiguous array under the hood.

rnn = th.nn.LSTM(5, 10, 1, batch_first=True)  # batch_first

x = th.autograd.Variable(th.randn(1, 2, 5))  # batch size 1
out, state = rnn(x)
print(out.is_contiguous())    # TRUE

x = th.autograd.Variable(th.randn(2, 2, 5))  # batch size 2
out, state = rnn(x)
print(out.is_contiguous())    # FALSE

rnn = th.nn.LSTM(5, 10, 1)   # seq_len first

x = th.autograd.Variable(th.randn(2, 1, 5))  # batch size 1
out, state = rnn(x)
print(out.is_contiguous())    # TRUE

x = th.autograd.Variable(th.randn(2, 2, 5))  # batch size 2
out, state = rnn(x)
print(out.is_contiguous())    # TRUE

jekbradbury · May 7, 2017, 2:36pm

Yes, that’s right, although the cost of .contiguous() is usually not that much compared to the LSTM itself.