Context: My project revolves around John Conway’s Game of Life. I am trying to predict patterns of ‘spaceships’.
I was not sure what category to put this in as I’m trying to find the correct search terms to my question. Essentially, I have a target matrix T (that will always be 2d). Every value of that matrix is a probability (between 0 and 1) of an object(cell) being present at those coordinates. The neural network’s input will get a matrix I, which has the same dimensions as T. The network should output another matrix (the same size as I and T) which represents the change in I to reach T. ie, let’s say the output of the network is O, then we would want the result to be: I + O = T. There is no correlation between the input matrices (they are independent from each other)
The dataset I use is generated by taking the opposite of the equation. Depending on which dataset I use, the matrix I will be some incomplete probability grid (could be random, close to the result matrix T, or empty). The expected output matrix is found by taking T - I. The network therefore gets I as the input, and T - I as the solution matrix while training. I have also tried using Adaptive2dConvolutions, using a fully connected layer, and then re-adapting the output to be the same as the input size, but that also did not seem to work- although my code may have been incorrect.
I’m trying to teach the network how to change the values of the input matrix in order to reach the target matrix. Currently, I’m using a set of convolutions activated by swish functions, and I’m using mean square error as my loss function. However, I would like to know if there has been similar problems to the one I have, as so far I have had no luck in finding information about this specific topic.
My loss function is somewhat low depending on the dataset, but does not seem to improve over time. Furthermore, when I test my network, it does not work at all. For example, given an input matrix which is almost the same as T, the solution does not increase the probability in the correct coordinates. Or, using an empty matrix, the output of the network does not seem to increase the probability of coordinates it should.
More specifically, my questions are:
- Are there any other projects similar to this I can inspire myself from?
- What sort of general model should I use? (convolutions only, fully connected, something else?)
- Is MSE the correct loss function that I should be using?
Here’s the code of my network:
class ProbabilityFinder(nn.Module):
def __init__(self, batch_size):
self.batch_size = batch_size
super(ProbabilityFinder, self).__init__()
self.conv1 = nn.Conv2d(1, 3, batch_size)
self.conv2 = nn.Conv2d(3, 12, batch_size)
self.conv3 = nn.Conv2d(12, 24, batch_size)
self.conv4 = nn.Conv2d(24, 48, batch_size)
self.finalConv = nn.Conv2d(48, 1, batch_size)
self.silu = nn.SiLU()
def forward(self, x):
x = self.silu(self.conv1(x))
x = self.silu(self.conv2(x))
x = self.silu(self.conv3(x))
x = self.silu(self.conv4(x))
x = self.finalConv(x)
return x