Any tips for CNN + RNN implementation?

I’m implementing a model in which a CNN model is used to extract feature sequences from segments of time series, and RNN is used to analyze the generated feature sequences, and output a classification result. I run into many problems like vanishing gradient and out of memory. I’m wondering if there are people who had done or are doing the same thing as me, and I consider that some of your experiences could be valuable for me. My core idea is implemented in the following code, and tell me if you need more.

def to_pieces(signal_batch, piece_size):
    """
    Transfer a batch of signal to a larger batch of cropped signals
    :param signal_batch : the batch of input signal, its size should be (batch_size x num_channels x *)
    :param piece_size : the size of every piece (int)
    :return: pieces : pieces generated, size of (* x num_channels x piece_size)
            index : a list containing number of pieces generated by each signal (batch_size)
    """
    pieces = []
    index = []
    for signal in signal_batch:
        num_piece = math.ceil(len(signal[0])/piece_size)
        index.append(num_piece)
        piece = torch.split(signal, piece_size, dim=1)
        pieces.extend(piece)
    return torch.stack(pieces),torch.FloatTensor(np.array(index))

for epoch in range(opt.niter):
    # Train
    for i, data in enumerate(train_loader, 0):
        model.zero_grad()
        rnn.zero_grad()
        signal_cpu, label_cpu = data['signal'], data['label']
        # Split a batch of varied-length signals into pieces, and record the number of pieces for each signal.
        # For CNN, the input will look like a batch of signals, except that batch_size is replaced by
        # total number of pieces
        pieces, num_pieces = to_pieces(signal_cpu, opt.recordLength)
        batch_size = len(num_pieces)
        if opt.cuda:
            pieces = pieces.cuda()
        input.resize_as_(pieces).copy_(pieces)
        inputv = Variable(input)
        # a VGG-16 CNN model here
        cnnout = model(inputv)
        # Re-organize the output of CNN model, group output pieces of each signal into sequences
        cum_num_pieces = [0]
        cum_num_pieces.extend([int(sum(num_pieces[:i + 1])) for i in range(len(num_pieces))])
        features = [cnnout.cpu()[cum_num_pieces[i]:cum_num_pieces[i + 1]] for i in range(len(num_pieces))]
        # Sort the sequences by length, as required by sequence packing in the next step
        sorted, indices = torch.sort(num_pieces, descending=True)
        features, label_cpu = [features[i] for i in indices],[label_cpu[i] for i in indices]
        label_cpu = torch.LongTensor(np.array(label_cpu))
        # Pad and pack sequence, the pad_sequence function is grabbed from pytorch 0.4 version
        packed_sequence = torch.nn.utils.rnn.pack_padded_sequence(
            pad_sequence(features, batch_first=True), sorted.int().numpy().tolist(), batch_first=True)
        # A 2 layer LSTM here
        output, _ = rnn(packed_sequence)

        out, length_sequence = torch.nn.utils.rnn.pad_packed_sequence(output, batch_first=True)

        # Get the last output of LSTM for every sequence
        length_sequence = Variable(torch.LongTensor(length_sequence)-1)
        idx = length_sequence.view(-1,1).expand(out.size(0), out.size(2)).unsqueeze(1).long()
        final_out = out.gather(1, idx).squeeze()

        label.resize_(batch_size).copy_(label_cpu)

        labelv = Variable(label)
        # Cross Entropy Loss
        err = criterion(final_out, labelv)
        err.backward()
        optimizer.step()

        print('[%d/%d][%d/%d] Loss: %.4f' % (epoch, opt.niter, i, len(train_loader), err.data[0]))

Don’t understand why some of codes have different fonts. I will immediately change it if anyone tells me how to do that.

You used the blockquote tool.
The code formatting tool looks like </>
To do it manually put three backticks on a separate line just before your code block, then three more backticks on a separate line after your code block.

Thank you very much! now it looks much better. :grin:

Update: an out of memory error is reported after 98 epochs of training.
Does any part of my code causes a memory leak problem?

Got an idea for this question. I can possibly apply only convolution layers to the entire signal and feed the feature map into LSTM model. This should form much clearer code. However, it may violate the requirement batch processing. Any insights about this problem?

A possible reason for previously mentioned ‘out of memory’ error, is that size of the larger batch generated by to_pieces is variable. It depends on the length of input records, and may caused that error when it became too large.