I'm stuck in converting Keras LSTM to Pytorch. Especially in Data Type

I’m trying to model converting from Keras to Pytorch.
I think the problem is to train model part code.
So, can anyone help this part for me.
I’m newb at Pytorch…

This is a Keras model configuration.
The objective of this model is many to one model for prediction.
The data ( [ a1 a2 a3 a4 a5] ) is fed into this model. And model predict a6.

    model = Sequential()
    model.add(LSTM(10, return_sequences=True, input_shape=(windowsize,1)))

This is a pytorch code.

import numpy as np
import torch.nn as nn
import torch
from torch.autograd import Variable
from numpy import array
from matplotlib import pyplot
from pandas import Series
from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from sklearn.preprocessing import MinMaxScaler
import torch.nn.functional as F

def timeseries_to_supervised(data, lag):

    dataX, dataY = [], []
    for i in range(len(data) - lag - 1):
        dataX.append(data[i:(i + lag)])
        dataY.append(data[i + lag])
    return dataX, dataY

def difference(dataset, interval=1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
    #One-dimensional ndarray with axis labels
    return np.asarray(diff)

def loaddata(path):
    series = read_csv(path, header=0, parse_dates=[0], index_col=0,
    return series

class mod(nn.Module):
    def __init__(self, feature_size, hidden_size, output_size):
        super(mod, self).__init__()

        self.hidden_size = hidden_size
        self.rnn1 = nn.RNN(input_size=feature_size,
        self.dense1 = nn.Linear(hidden_size, output_size)

    def forward(self,input, hidden):
        x, hidden = self.rnn1(input, hidden)
        x = x[- 1]
        x = x.view(-1, self.hidden_size)
        x = F.softmax(self.dense1(x))
        return x
    def init_hidden(self, num_layer, batch_size, hidden_size):
        return Variable(torch.zeros(num_layer, batch_size, hidden_size))

if __name__ == "__main__":

    #timestep = windowsize
    windowsize = 5
    #windowsize ...  x = [ 1, 2, 3, 4, 5 ] y = [ 6 ]
    outputsize = 1
    featuresize = 1
    scaler = MinMaxScaler(feature_range=(0, 1))

    series = loaddata('C:/Users/asdfw/Desktop/ICC_Example/data/BTCUSD.csv')
    diff_data = difference(series, 1)
    diff_data = diff_data.reshape(-1, 1)
    scaler = scaler.fit(diff_data)
    diff_raw = diff_data
    #diff_data = scaler.transform(diff_data)
    X, y = timeseries_to_supervised(diff_data, windowsize)

    # ----------------------------------------------------------#
    X = np.asarray(X)
    y = np.asarray(y)
    print('X is ', X.shape)
    print('Y is ', y.shape)


    batch_size = 1
    num_layer = 1
    hidden_size = 20


    model = mod(featuresize, hidden_size, outputsize)
    criterion = nn.L1Loss()
    hidden = model.init_hidden(num_layer, batch_size, hidden_size)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

    #-------------------------------------------Probelm is started here---------------------------------------------------------------
    X = np.asarray(X[0,:,:])
    y = y[0,:]

    print(X, y)

    for epoch in range(0, 10):
        print('Input X/Y Size is ' ,X.size(), y.size())

        output = model(X, hidden)

        loss = criterion(output, y)

The first problem is that X and y are numpy arrays, not torch variables.

X = Variable(X)
y = Variable(y)

The next problem is that X must be of shape (timesteps, batches, features). In your case batches is 1, and features is 1.
The forward method of your model does correctly select only the last timestep of the RNN prediction. So your model output will be of shape (batches, features).

The third problem is that the model uses a softmax after the dense layer. The dense layer outputs a vector of one feature, and softmax ensures that this vector sums to one. Your model can only output a vector containing the single value [1].

Softmax is only used for “classification” problems, but your data is processed as a “regression” problem, in other words your model receives several values of diff_data and must try to predict the next value of diff_data.

A “classification” problem would be to predict “will the price go up or down during the next day” which is a simple binary choice. In that case, softmax would be appropriate.

You could also have multiple classes, e.g. “price will go up a lot”, “price will go up a little”, “price will go down a little”, “price will go down a lot”, where the threshold between “a lot” and “a little” is set in advance. Softmax would be appropriate here too.

1 Like