Hyperparameter Search for Connect Four (DDQN)

Hey guys, I just wrote a code to train a DDQN to learn Connect Four. As this is my first experience with DDQN’s, I have no ideas about which hyperparameters could roughly work out, if I train the network long enough. My convolutional network looks as follows:

class CNN(nn.Module):
    def __init__(self):
        # process 6x7
        super(CNN, self).__init__()
        self.process_cnn = nn.Sequential(
            # transforms to 5x6 (formula: out = (in-k+2*p)/s + 1)
            nn.Conv2d(1, 16, 4, stride=1, padding=1),
            # transforms to 4x5
            nn.Conv2d(16, 32, 3, stride=1, padding=1),
        self.process_lin = nn.Sequential(
            nn.Linear(5 * 6 * 32, 64),
            nn.Linear(64, 7)

    def process(self, x):
        # Apply convolutions
        x = self.process_cnn(x)
        # Flatten
        x = x.view([x.size(0), -1])
        # Apply linear layers
        x = self.process_lin(x)
        return x

    def forward(self, x):
        x = self.process(x)
        return x

From my experience, the number and size of the kernel is not too important to find convergence, as long as you can cover the complexity of the problem. What I am interested in specifically is, what batch size should I use and after how many batches should I update the target network. Also some approximate values for learning rate, the epsilon value and the discount would be very helpful. Currently I am trying out linearly decaying epsilon values and learning rates - is this in general a good idea?

Thank you very much!!