Mixing Recurrent and non-Recurrent Layers Caffe2

Hi,

I’m trying to play around with, and understand, the use of LSTMs with Caffe2.

I’ve read and run the rather short tutorial/example but find it very non-tutorially in terms of actually telling you anything but - run this script.

What I’d like to do now is try and add an LSTM layer at the end of a number of other layers, say ReLU layers.

Creating a model roughly like this:

arg_scope = {“order”: “NCHW”}

model = model_helper.ModelHelper()

# Get the data
data, label = AddInput(model, batch_size=25,
                                    db=train_data_path,
                                    db_type='minidb')

# Define the layers
layer = brew.fc(model, data, 'dense_1', dim_in=13, dim_out=256)    
layer = brew.relu(model, layer, 'relu_1')
layer = brew.fc(model, layer, 'dense_2', dim_in=256, dim_out=256)
layer = brew.relu(model, layer, 'relu_2')
layer = brew.fc(model, layer, 'dense_3', dim_in=256, dim_out=256)
layer = brew.relu(model, layer, 'relu_3')

workspace.FeedBlob(
    "seq_lengths",
    np.array([1] * 25, dtype=np.float32)
)

seq_lengths, target = \
    model.net.AddExternalInputs(
        'seq_lengths',
        'target',
    )

lstm_output, hidden_output, _, cell_state = LSTM(model,
                                                        layer,
                                                        seq_lengths,
                                                        None,
                                                        256,
                                                        256,
                                                        scope="LSTM",
                                                        forward_only=False)

output = brew.fc(model, lstm_output, 'lstm_out', dim_in=256, dim_out=2, axis=2)

softmax = model.net.Softmax(output, 'softmax', axis=2)

I.e. input to the LSTM is the output of the previous ReLU layer. This seems to work when initializing the model and creating it in memory (RunNetOnce - CreateNet), but once starting the training loop (RunNet) I get the following error:

WARNING:caffe2.python.workspace:Original python traceback for operator 9 in network model in exception above (most recent call last):
WARNING:caffe2.python.workspace: File “lstm_test.py”, line 157, in
WARNING:caffe2.python.workspace: File “lstm_test.py”, line 54, in getModel
WARNING:caffe2.python.workspace: File “/usr/local/lib/python2.7/dist-packages/caffe2/python/rnn_cell.py”, line 1571, in _LSTM
WARNING:caffe2.python.workspace: File “/usr/local/lib/python2.7/dist-packages/caffe2/python/rnn_cell.py”, line 93, in apply_over_sequence
WARNING:caffe2.python.workspace: File “/usr/local/lib/python2.7/dist-packages/caffe2/python/rnn_cell.py”, line 491, in prepare_input
WARNING:caffe2.python.workspace: File “/usr/local/lib/python2.7/dist-packages/caffe2/python/brew.py”, line 107, in scope_wrapper
WARNING:caffe2.python.workspace: File “/usr/local/lib/python2.7/dist-packages/caffe2/python/helpers/fc.py”, line 58, in fc
WARNING:caffe2.python.workspace: File “/usr/local/lib/python2.7/dist-packages/caffe2/python/helpers/fc.py”, line 54, in _FC_or_packed_FC
Traceback (most recent call last):
File “lstm_test.py”, line 182, in
workspace.RunNet(train_model.net)
File “/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py”, line 217, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File “/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py”, line 178, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: [enforce fail at tensor.h:76] axis_index < ndims. 2 vs 2 Error from operator:
input: “relu_3” input: “LSTM/i2h_w” input: “LSTM/i2h_b” output: “LSTM/i2h” name: “” type: “FC” arg { name: “use_cudnn” i: 1 } arg { name: “cudnn_exhaustive_search” i: 0 } arg { name: “order” s: “NCHW” } arg { name: “axis” i: 2 }

I suspect I don’t fully understand the interaction between sequence_lengths, axis dimensions and what ndims is here and this cause the problem. Any help is appreciated!