Dear Community,
I’ve been trying to test out a bidirectional LSTM in Pytorch’s C++ API, but I can’t get it to work.
If I define a LSTM-module with the option bidirectional(true), does it automatically create a reverse version
of the input sequence and feed that into to reverse LSTM? Does it also automatically concatenate the output tensors into one, that can be directly fed into a dense layer, and how to format it to feed it into one?
My Code:
auto bilstm = torch::nn::LSTM(torch::nn::LSTMOptions(1, 1).layers(1).bidirectional(true));
auto linear = torch::nn::Linear(2, 1);
auto input = torch::randn({ 3,1,1 }); //Sequence with 3 timesteps, 1 Batch, 1 Feature per timestep
try
{
auto bi_out = bilstm->forward(input); //ERROR
std::cout << bi_out.output;
auto result = linear(bi_out.output.view({3,-1}));
}
catch (const c10::Error& e)
{
std::cout << e.what();
//Odd number of params or hiddens given to a bidirectional RNN (pair_vec at ..\aten\src\ATen\native\RNN.cpp:135)
//(no backtrace available)
}
When I run this, it causes an exception when feeding the sequence into the blstm, saying that there’s an odd number of parameters or hidden states given to the BLSTM. I see that there’s a second paramter for the forward method of the BLSTM, for the state of the BLSTM, but I don’t think that this one should be necessary, since I don’t need a custom state.
OS: Windows 10
Libtorch: Nightly build for CUDA 10 Release 1.0.0
I would appreciate it a lot if somebody could give me a simple example on how to parse a tensor to and from a BLSTM layer in the C++ API.
Many thanks in advance.
Update:
This simple Code here also produces an Error, though a different one
#include <torch/torch.h>
#include <iostream>
#define INPUTS 1
#define SEQUENCE 3
#define BATCH 1
#define LAYERS 2
#define HIDDEN 1
#define DIRECTIONS 2
int main()
{
try
{
auto input = torch::randn({ SEQUENCE, BATCH, INPUTS });
auto blstm = torch::nn::LSTM(
torch::nn::LSTMOptions(INPUTS, HIDDEN)
.layers(LAYERS).bidirectional(true));
auto output = blstm->forward(input); //ERROR
}
catch (const c10::Error & e)
{
std::cout << e.what();
//Expected more hidden states in stacked_rnn
//(apply_layer_stack at ..\aten\src\ATen\native\RNN.cpp:515)
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
What am I doing wrong here?
Every help is appreciated.
Update 2:
Turns out, according to this Request: https://github.com/pytorch/pytorch/issues/17998, that Bidirectional
RNNs are not yet implemented for the C++ API, so I’ll wait for that first.
Update 3:
So I’ve implemented a simple Net with a bidirectional LSTM myself. According to my little test program, it seems to be working. Here’s my Code, I hope it helps people that are stuck in the same situation as I am.
#include <torch/torch.h>
#include <iostream>
#include <vector>
#define INPUTS 1
#define SEQUENCE 3
#define BATCH 1
#define LAYERS 3
#define HIDDEN 2
#define DIRECTIONS 2
#define OUTPUTS 1
struct BLSTM_Model : torch::nn::Module
{
torch::nn::LSTM lstm{ nullptr };
torch::nn::LSTM reverse_lstm{ nullptr };
torch::nn::Linear linear{ nullptr };
BLSTM_Model(uint64_t layers, uint64_t hidden, uint64_t inputs)
{
lstm = register_module("lstm", torch::nn::LSTM(
torch::nn::LSTMOptions(inputs, hidden).layers(layers)));
reverse_lstm = register_module("rlstm", torch::nn::LSTM(
torch::nn::LSTMOptions(inputs, hidden).layers(layers)));
linear = register_module("linear", torch::nn::Linear(
hidden*DIRECTIONS, OUTPUTS));
}
torch::Tensor forward(torch::Tensor x)
{
//Reverse and feed into LSTM + Reversed LSTM
auto lstm1 = lstm->forward(x.view({ x.size(0), BATCH, -1 }));
//[SEQUENCE,BATCH,FEATURE]
auto lstm2 = reverse_lstm->forward(torch::flip(x, 0).view({ x.size(0), BATCH, -1 }));
//Reverse Output from Reversed LSTM + Combine Outputs into one Tensor
auto cat = torch::empty({DIRECTIONS, BATCH, x.size(0), HIDDEN});
//[DIRECTIONS,BATCH,SEQUENCE,FEATURE]
cat[0] = lstm1.output.view({ BATCH, x.size(0), HIDDEN });
cat[1] = torch::flip(lstm2.output.view({ BATCH, x.size(0), HIDDEN }), 1);
//Feed into Linear Layer
auto out = torch::sigmoid(linear->forward(
cat.view({BATCH, x.size(0), HIDDEN*DIRECTIONS})));
//[BATCH,SEQUENCE,FEATURE]
return out;
}
};
int main()
{
//Input: 0.1, 0.2, 0.3 -> Expected Output: 0.4, 0.5, 0.6
BLSTM_Model model = BLSTM_Model(LAYERS, HIDDEN, INPUTS);
torch::optim::Adam optimizer(
model.parameters(), torch::optim::AdamOptions(0.0001));
//Input
torch::Tensor input = torch::empty({ SEQUENCE, INPUTS });
auto input_acc = input.accessor<float, 2>();
size_t count = 0;
for (float i = 0.1; i < 0.4; i+=0.1)
{
input_acc[count][0] = i;
count++;
}
//Target
torch::Tensor target = torch::empty({ SEQUENCE, OUTPUTS });
auto target_acc = target.accessor<float, 2>();
count = 0;
for (float i = 0.4; i < 0.7; i+=0.1)
{
target_acc[count][0] = i;
count++;
}
//Train
for (size_t i = 0; i < 6000; i++)
{
torch::Tensor output = model.forward(input);
auto loss = torch::mse_loss(output.view({SEQUENCE, OUTPUTS}), target);
std::cout << "Loss "<< i << " : " << loss.item<float>() << std::endl;
loss.backward();
optimizer.step();
}
//Test: Response should be about (0.4, 0.5, 0.6)
torch::Tensor output = model.forward(input);
std::cout << output << std::endl;
return EXIT_SUCCESS;
}
If there’s something wrong or you see a potential problem/risk with my code, please let me know.
With kind regards, Florian Korotschenko