I am new to PyTorch and I am trying to build a reinforcement learning system that uses OpenAI for trying to predict whether or not a stock should be bought or not and at what time.
class NeuronalNetwork(nn.Module): def __init__(self, stock_env: StockEnv): super(NeuronalNetwork, self).__init__() self.stock_env = stock_env input_size = len(self.stock_env.normalized_dataframe.columns) self.hidden_size = 128 self.num_layers = 4 self.kernel = 2 output_size = self.stock_env.action_space.n self.lstm = nn.LSTM(input_size=input_size, hidden_size=self.hidden_size, num_layers=self.num_layers, batch_first=True) self.output_layer = nn.Linear(self.hidden_size, output_size) self.softmax = nn.LogSoftmax(dim=output_size) self.tanh = nn.Tanh() def forward(self, x, hidden=None): # N x T x D # N - the number of windows sizes # T - the window size # D - the number of indicators and OHLCV in total if len(x.shape) > 2: batch_size = x.shape else: batch_size = 1 if hidden is None: hidden = ( torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device), torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device), ) D = len(self.stock_env.normalized_dataframe.columns) T = self.stock_env.window_size N = batch_size x = x.view(N, T, D).type(torch.FloatTensor).to(device) out, (ht, ct) = self.lstm(x, hidden) out = self.tanh(out) out = self.output_layer(out) return out
My x from
forward is representing my data under the form
[Number_of_batches x Window_size x Features]
For the moment my
out will be the shape
Number_of_batches x Window_size x Action but what I want to make my model learn is to predict the best action ONLY for the 250th element. So does anyone know what can I do in order to obtain an
out with a shape of (batch_size x action) where the action is going to be the last element from the column window_sie?
out.shape => (batch_size, windows_size, features) FOR **b** all batch_size: batch =  FOR _ all actions batch.append(out[b]) new_out.append(batch)
And on the end, I will have an
out that is going to be batch_size x action where the action is only going to be the action of the 250th element from the window_size.
I’m not sure if makes sense for you what is my question, but it doesn’t just let me know and I will try to explain it differently.