How to modify LSTM

xzyx · January 8, 2018, 7:08am

A very basic question: How could I implement my own LSTM by modifying the existing implementation? I read the source code but couldn’t find where the network structures are really implemented …

In the RNNBase module, it seems that this two lines in forward() really run implement the computation?

func = self._backend.RNN(
self.mode,
self.input_size,
self.hidden_size,
num_layers=self.num_layers,
batch_first=self.batch_first,
dropout=self.dropout,
train=self.training,
bidirectional=self.bidirectional,
batch_sizes=batch_sizes,
dropout_state=self.dropout_state,
flat_weight=flat_weight
)
output, hidden = func(input, self.all_weights, hx)

but then I don’t know what self._backend is… I tried to trace back through the codes and then I got a bit lost.

alishir · January 8, 2018, 7:11am

You want to implement custom LSTM cell or custom LSTM network?

xzyx · January 8, 2018, 7:28am

I want to customize the LSTM cell.

Thanks!

alishir · January 8, 2018, 7:44am

Hmm, sorry, I don’t know how LSTM cell could be customized.

jpeg729 · January 8, 2018, 8:39am

The LSTM class is implemented in C so it is hard to find and harder to customise. The LSTMCell class is implemented in python here, and the actual details of the calculation are implemented in python here.

Those links are for PyTorch v0.3.0. I assume you know how to find the corresponding master branch should you need to.

xzyx · January 8, 2018, 5:55pm

Is it true that the forward function in the StackedRNN in the second link would be the python version of multi-layer LSTM/RNN?

I guess I’m mostly looking for a python only mulit-layer RNN/LSTM as reference so that I don’t have to completely start from scratch.

Thanks!

dgriff · January 8, 2018, 7:54pm

You can use this:

class LSTM(nn.Module):

    """
    An implementation of Hochreiter & Schmidhuber:
    'Long-Short Term Memory'
    http://www.bioinf.jku.at/publications/older/2604.pdf
    Special args:
    dropout_method: one of
            * pytorch: default dropout implementation
            * gal: uses GalLSTM's dropout
            * moon: uses MoonLSTM's dropout
            * semeniuta: uses SemeniutaLSTM's dropout
    """

    def __init__(self, input_size, hidden_size, bias=True, dropout=0.0, dropout_method='pytorch'):
        super(LSTM, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.bias = bias
        self.dropout = dropout
        self.i2h = nn.Linear(input_size, 4 * hidden_size, bias=bias)
        self.h2h = nn.Linear(hidden_size, 4 * hidden_size, bias=bias)
        self.reset_parameters()
        assert(dropout_method.lower() in ['pytorch', 'gal', 'moon', 'semeniuta'])
        self.dropout_method = dropout_method

    def sample_mask(self):
        keep = 1.0 - self.dropout
        self.mask = V(th.bernoulli(T(1, self.hidden_size).fill_(keep)))

    def reset_parameters(self):
        std = 1.0 / math.sqrt(self.hidden_size)
        for w in self.parameters():
            w.data.uniform_(-std, std)

    def forward(self, x, hidden):
        do_dropout = self.training and self.dropout > 0.0
        h, c = hidden
        h = h.view(h.size(1), -1)
        c = c.view(c.size(1), -1)
        x = x.view(x.size(1), -1)

        # Linear mappings
        preact = self.i2h(x) + self.h2h(h)

        # activations
        gates = preact[:, :3 * self.hidden_size].sigmoid()
        g_t = preact[:, 3 * self.hidden_size:].tanh()
        i_t = gates[:, :self.hidden_size]
        f_t = gates[:, self.hidden_size:2 * self.hidden_size]
        o_t = gates[:, -self.hidden_size:]

        # cell computations
        if do_dropout and self.dropout_method == 'semeniuta':
            g_t = F.dropout(g_t, p=self.dropout, training=self.training)

        c_t = th.mul(c, f_t) + th.mul(i_t, g_t)

        if do_dropout and self.dropout_method == 'moon':
                c_t.data.set_(th.mul(c_t, self.mask).data)
                c_t.data *= 1.0/(1.0 - self.dropout)

        h_t = th.mul(o_t, c_t.tanh())

        # Reshape for compatibility
        if do_dropout:
            if self.dropout_method == 'pytorch':
                F.dropout(h_t, p=self.dropout, training=self.training, inplace=True)
            if self.dropout_method == 'gal':
                    h_t.data.set_(th.mul(h_t, self.mask).data)
                    h_t.data *= 1.0/(1.0 - self.dropout)

        h_t = h_t.view(1, h_t.size(0), -1)
        c_t = c_t.view(1, c_t.size(0), -1)
        return h_t, (h_t, c_t)

xzyx · January 8, 2018, 9:09pm

I am confused … I guess you just have an LSTM cell here? Do you have a wrap up of this into a multi-layer LSTM?

(Also I am not sure why you have batch-normalization?)

Thanks!

dgriff · January 8, 2018, 10:46pm

edited above to pytorch version found here https://github.com/pytorch/benchmark/blob/master/benchmarks/lstm_variants/lstm.py

The LSTM layer version could not find but believe its specialized for efficiency so hard to modify. Usually best to customize cell version for custom stuff. Can modify to make it a layer version if you need. Hope its helpful sorry couldn’t find exactly what you are looking for

Roni_Kobrosly · May 14, 2018, 2:06am

Hi, I tried to implement your LSTM template for coding a custom cell but I’m getting tensor size mismatch errors. Here is my code: Size mismatch error when using custom LSTM cell. Did you encounter anything like this? Any suggestions?

Harikrishna.Vydana · October 29, 2018, 5:47pm

could not access the link can you provide it
thanks

kangkang_li · July 3, 2019, 3:52am

Have you implemented a modified LSTMCell?

STU · August 13, 2020, 1:24pm

hey, I read some blogs but I still can’t find out how to modify the LSTMCell especially when trying use Bi-directional-LSTM, have you know how to deal with that?