How to modify LSTM

A very basic question: How could I implement my own LSTM by modifying the existing implementation? I read the source code but couldn’t find where the network structures are really implemented …

In the RNNBase module, it seems that this two lines in forward() really run implement the computation?

func = self._backend.RNN(
self.mode,
self.input_size,
self.hidden_size,
num_layers=self.num_layers,
batch_first=self.batch_first,
dropout=self.dropout,
train=self.training,
bidirectional=self.bidirectional,
batch_sizes=batch_sizes,
dropout_state=self.dropout_state,
flat_weight=flat_weight
)
output, hidden = func(input, self.all_weights, hx)

but then I don’t know what self._backend is… I tried to trace back through the codes and then I got a bit lost.

3 Likes

You want to implement custom LSTM cell or custom LSTM network?

1 Like

I want to customize the LSTM cell.

Thanks!

Hmm, sorry, I don’t know how LSTM cell could be customized.

The LSTM class is implemented in C so it is hard to find and harder to customise. The LSTMCell class is implemented in python here, and the actual details of the calculation are implemented in python here.

Those links are for PyTorch v0.3.0. I assume you know how to find the corresponding master branch should you need to.

1 Like

Is it true that the forward function in the StackedRNN in the second link would be the python version of multi-layer LSTM/RNN?

I guess I’m mostly looking for a python only mulit-layer RNN/LSTM as reference so that I don’t have to completely start from scratch.

Thanks!

You can use this:

class LSTM(nn.Module):

    """
    An implementation of Hochreiter & Schmidhuber:
    'Long-Short Term Memory'
    http://www.bioinf.jku.at/publications/older/2604.pdf
    Special args:
    dropout_method: one of
            * pytorch: default dropout implementation
            * gal: uses GalLSTM's dropout
            * moon: uses MoonLSTM's dropout
            * semeniuta: uses SemeniutaLSTM's dropout
    """

    def __init__(self, input_size, hidden_size, bias=True, dropout=0.0, dropout_method='pytorch'):
        super(LSTM, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.bias = bias
        self.dropout = dropout
        self.i2h = nn.Linear(input_size, 4 * hidden_size, bias=bias)
        self.h2h = nn.Linear(hidden_size, 4 * hidden_size, bias=bias)
        self.reset_parameters()
        assert(dropout_method.lower() in ['pytorch', 'gal', 'moon', 'semeniuta'])
        self.dropout_method = dropout_method

    def sample_mask(self):
        keep = 1.0 - self.dropout
        self.mask = V(th.bernoulli(T(1, self.hidden_size).fill_(keep)))

    def reset_parameters(self):
        std = 1.0 / math.sqrt(self.hidden_size)
        for w in self.parameters():
            w.data.uniform_(-std, std)

    def forward(self, x, hidden):
        do_dropout = self.training and self.dropout > 0.0
        h, c = hidden
        h = h.view(h.size(1), -1)
        c = c.view(c.size(1), -1)
        x = x.view(x.size(1), -1)

        # Linear mappings
        preact = self.i2h(x) + self.h2h(h)

        # activations
        gates = preact[:, :3 * self.hidden_size].sigmoid()
        g_t = preact[:, 3 * self.hidden_size:].tanh()
        i_t = gates[:, :self.hidden_size]
        f_t = gates[:, self.hidden_size:2 * self.hidden_size]
        o_t = gates[:, -self.hidden_size:]

        # cell computations
        if do_dropout and self.dropout_method == 'semeniuta':
            g_t = F.dropout(g_t, p=self.dropout, training=self.training)

        c_t = th.mul(c, f_t) + th.mul(i_t, g_t)

        if do_dropout and self.dropout_method == 'moon':
                c_t.data.set_(th.mul(c_t, self.mask).data)
                c_t.data *= 1.0/(1.0 - self.dropout)

        h_t = th.mul(o_t, c_t.tanh())

        # Reshape for compatibility
        if do_dropout:
            if self.dropout_method == 'pytorch':
                F.dropout(h_t, p=self.dropout, training=self.training, inplace=True)
            if self.dropout_method == 'gal':
                    h_t.data.set_(th.mul(h_t, self.mask).data)
                    h_t.data *= 1.0/(1.0 - self.dropout)

        h_t = h_t.view(1, h_t.size(0), -1)
        c_t = c_t.view(1, c_t.size(0), -1)
        return h_t, (h_t, c_t)
2 Likes

I am confused … I guess you just have an LSTM cell here? Do you have a wrap up of this into a multi-layer LSTM?

(Also I am not sure why you have batch-normalization?)

Thanks!

edited above to pytorch version found here https://github.com/pytorch/benchmark/blob/master/benchmarks/lstm_variants/lstm.py

The LSTM layer version could not find but believe its specialized for efficiency so hard to modify. Usually best to customize cell version for custom stuff. Can modify to make it a layer version if you need. Hope its helpful sorry couldn’t find exactly what you are looking for

1 Like

Hi, I tried to implement your LSTM template for coding a custom cell but I’m getting tensor size mismatch errors. Here is my code: Size mismatch error when using custom LSTM cell. Did you encounter anything like this? Any suggestions?

1 Like

could not access the link can you provide it
thanks

Have you implemented a modified LSTMCell?

hey, I read some blogs but I still can’t find out how to modify the LSTMCell especially when trying use Bi-directional-LSTM, have you know how to deal with that?