Hey guys, I just wrote a code to train a DDQN to learn Connect Four. As this is my first experience with DDQN’s, I have no ideas about which hyperparameters could roughly work out, if I train the network long enough. My convolutional network looks as follows:
class CNN(nn.Module): def __init__(self): # process 6x7 super(CNN, self).__init__() self.process_cnn = nn.Sequential( # transforms to 5x6 (formula: out = (in-k+2*p)/s + 1) nn.Conv2d(1, 16, 4, stride=1, padding=1), nn.ReLU(True), # transforms to 4x5 nn.Conv2d(16, 32, 3, stride=1, padding=1), nn.ReLU(True) ) self.process_lin = nn.Sequential( nn.Linear(5 * 6 * 32, 64), nn.ReLU(True), nn.Linear(64, 7) ) def process(self, x): # Apply convolutions x = self.process_cnn(x) # Flatten x = x.view([x.size(0), -1]) # Apply linear layers x = self.process_lin(x) return x def forward(self, x): x = self.process(x) return x
From my experience, the number and size of the kernel is not too important to find convergence, as long as you can cover the complexity of the problem. What I am interested in specifically is, what batch size should I use and after how many batches should I update the target network. Also some approximate values for learning rate, the epsilon value and the discount would be very helpful. Currently I am trying out linearly decaying epsilon values and learning rates - is this in general a good idea?
Thank you very much!!