Can LSTM be used in non A3C model / Need help figuring out training this model

Hi,
I am still new to Machine learning, I am trying to make an ai for a fighting game, I am having trouble training this model

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable

class Ai(torch.nn.Module):

    def __init__(self, action_number):
        super(Ai, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3, stride=2, padding=1)
        self.conv2 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
        self.conv3 = nn.Conv2d(32, 32, 3, stride=2, padding=1)
        self.conv4 = nn.Conv2d(32, 64, 3, stride=2, padding=1)
        self.lstm = nn.LSTMCell(self.count_neurons((480, 270)), 256)
        self.fc1 = nn.Linear(256, action_number)
        self.train()

    def forward(self, inputs):
        inputs, (hx, cx) = inputs
        x = F.elu(self.conv1(inputs))
        x = F.elu(self.conv2(x))
        x = F.elu(self.conv3(x))
        x = F.elu(self.conv4(x))
        x = x.view(x.size(0), -1)
        hx, cx = self.lstm(x, (hx, cx))
        x = F.elu(self.fc1(hx))
        return x, (hx, cx)

    def count_neurons(self, image_dim):
        x = Variable(torch.rand(1, *image_dim))
        x = F.elu(self.conv1(x.unsqueeze(0)))
        x = F.elu(self.conv2(x))
        x = F.elu(self.conv3(x))
        x = F.elu(self.conv4(x))
        return x.data.view(1, -1).size(1)

here is the training file

#Training the model

import torch
import torch.nn.functional as F
import time
from collections import deque
from torch.autograd import Variable
from model import Ai
from functions import get_reward, get_screen, get_action_number, countdown, print_debug, do_action

def train():
    action_number = get_action_number()
    model = Ai(action_number)
    first = True
    last_time = time.time()
    while True:
        if first:
            #TODO load saved model
            cx = Variable(torch.zeros(1, 256), volatile=True)
            hx = Variable(torch.zeros(1, 256), volatile=True)
        else:
            cx = Variable(cx.data, volatile=True)
            hx = Variable(hx.data, volatile=True)
        img = torch.Tensor(get_screen())
        action_values, (hx, cx) = model((Variable(img.unsqueeze(0), volatile=True), (hx, cx)))
        prob = F.softmax(action_values)
        action = prob.multinomial().data.numpy()
        do_action(action[0][0])
        reward = get_reward()
        print_debug(last_time=last_time, action=action, reward=reward)
        last_time = time.time()


#countdown()
train()

here’s the get_screen() function

def get_screen():
    img = grab_screen(region=(1920, 93, 3200, 813))
    img = cv2.resize(img, (480, 270))
    img = np.transpose(img.reshape(480, 270, 1), (2, 0, 1))
    return img

so i still don’t have any idea how to train it, I used the pytorch-ac3 implementation from github, but it uses the value_loss and advantage, but I can’t use it since this is not an A3C model, thank you for your time

To be clear, you are training an agent to act in an environment in which it occasionally receives rewards for its actions, right?

In such an environment it is common to use DQN, A3C or any number of other training techniques.

I believe that a model suitable for A3C needs to produce two outputs at each step. One suggesting actions (the actor), and another predicting the value of said actions (the critic). Your model seems to be just predicting the actions. In which case DQN or one of its variants seems more appropriate than A3C.

Hi, Thank’s for the answer, i can’t use the A3C model because I can’t make a lot of env’s, i don’t know if it’s okay to use LSTM inside the model, can you link or give an example of training DQL model like this, thank you for your time

You could have tried looking through the official tutorials. Here is one http://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html