I am new to PyTorch and I am trying to build a reinforcement learning system that uses OpenAI for trying to predict whether or not a stock should be bought or not and at what time.
class NeuronalNetwork(nn.Module):
def __init__(self, stock_env: StockEnv):
super(NeuronalNetwork, self).__init__()
self.stock_env = stock_env
input_size = len(self.stock_env.normalized_dataframe.columns)
self.hidden_size = 128
self.num_layers = 4
self.kernel = 2
output_size = self.stock_env.action_space.n
self.lstm = nn.LSTM(input_size=input_size, hidden_size=self.hidden_size, num_layers=self.num_layers, batch_first=True)
self.output_layer = nn.Linear(self.hidden_size, output_size)
self.softmax = nn.LogSoftmax(dim=output_size)
self.tanh = nn.Tanh()
def forward(self, x, hidden=None):
# N x T x D
# N - the number of windows sizes
# T - the window size
# D - the number of indicators and OHLCV in total
if len(x.shape) > 2:
batch_size = x.shape[0]
else:
batch_size = 1
if hidden is None:
hidden = (
torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device),
torch.zeros(self.num_layers, batch_size, self.hidden_size).to(device),
)
D = len(self.stock_env.normalized_dataframe.columns)
T = self.stock_env.window_size
N = batch_size
x = x.view(N, T, D).type(torch.FloatTensor).to(device)
out, (ht, ct) = self.lstm(x, hidden)
out = self.tanh(out)
out = self.output_layer(out)
return out
My x from forward
is representing my data under the form [Number_of_batches x Window_size x Features]
For the moment my out
will be the shape Number_of_batches x Window_size x Action
but what I want to make my model learn is to predict the best action ONLY for the 250th element. So does anyone know what can I do in order to obtain an out
with a shape of (batch_size x action) where the action is going to be the last element from the column window_sie?
Example:
out.shape => (batch_size, windows_size, features)
FOR **b** all batch_size:
batch = []
FOR _ all actions
batch.append(out[b][250])
new_out.append(batch)
And on the end, I will have an out
that is going to be batch_size x action where the action is only going to be the action of the 250th element from the window_size.
I’m not sure if makes sense for you what is my question, but it doesn’t just let me know and I will try to explain it differently.