Hey guys, I just wrote a code to train a DDQN to learn Connect Four. As this is my first experience with DDQN’s, I have no ideas about which hyperparameters could roughly work out, if I train the network long enough. My convolutional network looks as follows:

```
class CNN(nn.Module):
def __init__(self):
# process 6x7
super(CNN, self).__init__()
self.process_cnn = nn.Sequential(
# transforms to 5x6 (formula: out = (in-k+2*p)/s + 1)
nn.Conv2d(1, 16, 4, stride=1, padding=1),
nn.ReLU(True),
# transforms to 4x5
nn.Conv2d(16, 32, 3, stride=1, padding=1),
nn.ReLU(True)
)
self.process_lin = nn.Sequential(
nn.Linear(5 * 6 * 32, 64),
nn.ReLU(True),
nn.Linear(64, 7)
)
def process(self, x):
# Apply convolutions
x = self.process_cnn(x)
# Flatten
x = x.view([x.size(0), -1])
# Apply linear layers
x = self.process_lin(x)
return x
def forward(self, x):
x = self.process(x)
return x
```

From my experience, the number and size of the kernel is not too important to find convergence, as long as you can cover the complexity of the problem. What I am interested in specifically is, what batch size should I use and after how many batches should I update the target network. Also some approximate values for learning rate, the epsilon value and the discount would be very helpful. Currently I am trying out linearly decaying epsilon values and learning rates - is this in general a good idea?

Thank you very much!!