Training a probability network

Context: My project revolves around John Conway’s Game of Life. I am trying to predict patterns of ‘spaceships’.

I was not sure what category to put this in as I’m trying to find the correct search terms to my question. Essentially, I have a target matrix T (that will always be 2d). Every value of that matrix is a probability (between 0 and 1) of an object(cell) being present at those coordinates. The neural network’s input will get a matrix I, which has the same dimensions as T. The network should output another matrix (the same size as I and T) which represents the change in I to reach T. ie, let’s say the output of the network is O, then we would want the result to be: I + O = T. There is no correlation between the input matrices (they are independent from each other)

The dataset I use is generated by taking the opposite of the equation. Depending on which dataset I use, the matrix I will be some incomplete probability grid (could be random, close to the result matrix T, or empty). The expected output matrix is found by taking T - I. The network therefore gets I as the input, and T - I as the solution matrix while training. I have also tried using Adaptive2dConvolutions, using a fully connected layer, and then re-adapting the output to be the same as the input size, but that also did not seem to work- although my code may have been incorrect.

I’m trying to teach the network how to change the values of the input matrix in order to reach the target matrix. Currently, I’m using a set of convolutions activated by swish functions, and I’m using mean square error as my loss function. However, I would like to know if there has been similar problems to the one I have, as so far I have had no luck in finding information about this specific topic.

My loss function is somewhat low depending on the dataset, but does not seem to improve over time. Furthermore, when I test my network, it does not work at all. For example, given an input matrix which is almost the same as T, the solution does not increase the probability in the correct coordinates. Or, using an empty matrix, the output of the network does not seem to increase the probability of coordinates it should.

More specifically, my questions are:

  • Are there any other projects similar to this I can inspire myself from?
  • What sort of general model should I use? (convolutions only, fully connected, something else?)
  • Is MSE the correct loss function that I should be using?

Here’s the code of my network:

class ProbabilityFinder(nn.Module):

	def __init__(self, batch_size):
		self.batch_size = batch_size
		super(ProbabilityFinder, self).__init__()
		self.conv1	= nn.Conv2d(1, 3, batch_size)
		self.conv2 = nn.Conv2d(3, 12, batch_size)
		self.conv3 = nn.Conv2d(12, 24, batch_size)
		self.conv4 = nn.Conv2d(24, 48, batch_size)
		self.finalConv = nn.Conv2d(48, 1, batch_size)
		self.silu = nn.SiLU()
		

	def forward(self, x):
		x = self.silu(self.conv1(x))
		x = self.silu(self.conv2(x))
		x = self.silu(self.conv3(x))
		x = self.silu(self.conv4(x))
		x = self.finalConv(x)
		return x

Struggling to understand the motivation for adding the complexity of a neural network when you can get an exact answer every time with T - I.

Second question. Why are you setting the kernel_size of your convolution layers to the batch_size? Normally, people set them at 1 or 3, sometimes even 7.

Third point. UNets are often used for extracting some semantic information from images, and returning the same size, often combining skip connections, batch norm, and self attention. I’d suggest looking into UNets for some ideas on structure, but you likely won’t need much downsampling, because of the simplicity involved with your problem.

Basically, I’m going to be training this network to find new patterns. I’m gonna feed this network to a tree search algorithm later on.

Yeah I just realized that’s an error, and I’m re-training it now, I’ll update with the results I get.

Thanks for the UNets tip!