How to use Bidirectional LSTM in C++ API correctly?

(Florian Korotschenko) #1

Dear Community,

I’ve been trying to test out a bidirectional LSTM in Pytorch’s C++ API, but I can’t get it to work.
If I define a LSTM-module with the option bidirectional(true), does it automatically create a reverse version
of the input sequence and feed that into to reverse LSTM? Does it also automatically concatenate the output tensors into one, that can be directly fed into a dense layer, and how to format it to feed it into one?
My Code:

        auto bilstm = torch::nn::LSTM(torch::nn::LSTMOptions(1, 1).layers(1).bidirectional(true));
	auto linear = torch::nn::Linear(2, 1);
	auto input = torch::randn({ 3,1,1 }); //Sequence with 3 timesteps, 1 Batch, 1 Feature per timestep
	try
	{
		auto bi_out = bilstm->forward(input); //ERROR
		std::cout << bi_out.output;
		auto result = linear(bi_out.output.view({3,-1}));

	}
	catch (const c10::Error& e)
	{
		std::cout << e.what();
		//Odd number of params or hiddens given to a bidirectional RNN (pair_vec at ..\aten\src\ATen\native\RNN.cpp:135)
		//(no backtrace available)
	}

When I run this, it causes an exception when feeding the sequence into the blstm, saying that there’s an odd number of parameters or hidden states given to the BLSTM. I see that there’s a second paramter for the forward method of the BLSTM, for the state of the BLSTM, but I don’t think that this one should be necessary, since I don’t need a custom state.

OS: Windows 10
Libtorch: Nightly build for CUDA 10 Release 1.0.0

I would appreciate it a lot if somebody could give me a simple example on how to parse a tensor to and from a BLSTM layer in the C++ API.
Many thanks in advance.

Update:
This simple Code here also produces an Error, though a different one

#include <torch/torch.h>
#include <iostream>

#define INPUTS 1
#define SEQUENCE 3
#define BATCH 1
#define LAYERS 2
#define HIDDEN 1
#define DIRECTIONS 2

int main()
{
	try
	{
		auto input = torch::randn({ SEQUENCE, BATCH, INPUTS });
		auto blstm = torch::nn::LSTM(
                  torch::nn::LSTMOptions(INPUTS, HIDDEN)
                    .layers(LAYERS).bidirectional(true));
		auto output = blstm->forward(input); //ERROR
	}
	catch (const c10::Error & e)
	{
		std::cout << e.what();
		//Expected more hidden states in stacked_rnn 
                //(apply_layer_stack at ..\aten\src\ATen\native\RNN.cpp:515)
                return EXIT_FAILURE;
	}
	return EXIT_SUCCESS;
}

What am I doing wrong here?
Every help is appreciated.

Update 2:
Turns out, according to this Request: https://github.com/pytorch/pytorch/issues/17998, that Bidirectional
RNNs are not yet implemented for the C++ API, so I’ll wait for that first.

Update 3:
So I’ve implemented a simple Net with a bidirectional LSTM myself. According to my little test program, it seems to be working. Here’s my Code, I hope it helps people that are stuck in the same situation as I am.

#include <torch/torch.h>
#include <iostream>
#include <vector>

#define INPUTS 1
#define SEQUENCE 3
#define BATCH 1
#define LAYERS 3
#define HIDDEN 2
#define DIRECTIONS 2
#define OUTPUTS 1

struct BLSTM_Model : torch::nn::Module
{
	torch::nn::LSTM lstm{ nullptr };
	torch::nn::LSTM reverse_lstm{ nullptr };
	torch::nn::Linear linear{ nullptr };

	BLSTM_Model(uint64_t layers, uint64_t hidden, uint64_t inputs)
	{
		lstm = register_module("lstm", torch::nn::LSTM(
                   torch::nn::LSTMOptions(inputs, hidden).layers(layers)));
		reverse_lstm = register_module("rlstm", torch::nn::LSTM(
                   torch::nn::LSTMOptions(inputs, hidden).layers(layers)));
		linear = register_module("linear", torch::nn::Linear(
                   hidden*DIRECTIONS, OUTPUTS));
	}

	torch::Tensor forward(torch::Tensor x)
	{
		//Reverse and feed into LSTM + Reversed LSTM
		auto lstm1 = lstm->forward(x.view({ x.size(0), BATCH, -1 })); 
                   //[SEQUENCE,BATCH,FEATURE]
		auto lstm2 = reverse_lstm->forward(torch::flip(x, 0).view({ x.size(0), BATCH, -1 }));
		//Reverse Output from Reversed LSTM + Combine Outputs into one Tensor
		auto cat = torch::empty({DIRECTIONS, BATCH, x.size(0), HIDDEN});
                   //[DIRECTIONS,BATCH,SEQUENCE,FEATURE]
		cat[0] = lstm1.output.view({ BATCH, x.size(0), HIDDEN });
		cat[1] = torch::flip(lstm2.output.view({ BATCH, x.size(0), HIDDEN }), 1);
		//Feed into Linear Layer
		auto out = torch::sigmoid(linear->forward(
                  cat.view({BATCH, x.size(0), HIDDEN*DIRECTIONS})));
                   //[BATCH,SEQUENCE,FEATURE]
		return out; 
	}
};

int main()
{
	//Input: 0.1, 0.2, 0.3 -> Expected Output: 0.4, 0.5, 0.6
	BLSTM_Model model = BLSTM_Model(LAYERS, HIDDEN, INPUTS);
	torch::optim::Adam optimizer(
          model.parameters(), torch::optim::AdamOptions(0.0001));
	//Input
	torch::Tensor input = torch::empty({ SEQUENCE, INPUTS });
	auto input_acc = input.accessor<float, 2>();
	size_t count = 0;
	for (float i = 0.1; i < 0.4; i+=0.1)
	{
		input_acc[count][0] = i;
		count++;
	}
	//Target
	torch::Tensor target = torch::empty({ SEQUENCE, OUTPUTS });
	auto target_acc = target.accessor<float, 2>();
	count = 0;
	for (float i = 0.4; i < 0.7; i+=0.1)
	{
		target_acc[count][0] = i;
		count++;
	}
	//Train
	for (size_t i = 0; i < 6000; i++)
	{
		torch::Tensor output = model.forward(input);
		auto loss = torch::mse_loss(output.view({SEQUENCE, OUTPUTS}), target);
		std::cout << "Loss "<< i << " : " << loss.item<float>() << std::endl;
		loss.backward();
		optimizer.step();
	}
	//Test: Response should be about (0.4, 0.5, 0.6)
	torch::Tensor output = model.forward(input);
	std::cout << output << std::endl;
	return EXIT_SUCCESS;
}

If there’s something wrong or you see a potential problem/risk with my code, please let me know.

With kind regards, Florian Korotschenko