Predicting shorelines using a LSTM

Hello, as part of my final thesis I want to train a neural network for predicting the shorelines in aereal images using an LSTM. I am pretty new to PyTorch, so I am also using this project to learn from scratch.

The idea of using an LSTM is because I have a low number of samples for the dataset, so I am using the columns of the image as input of the LSTM, where the pixel labeled as shoreline is the target output for the neural network. In the end, I have a dataset of 8 images for training, but every image has near 700 columns, so I end up with 5600 columns to train the network.

I have researched about the application of LSTM networks for this use case, but I can’t find very much information about this, so I am testing some architectures and looking how they perform.

Right now, I’ve defined my neural network like this:

class BiLSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes, num_layers):
        super(BiLSTM, self).__init__()
        self.hidden_size = hidden_size
        
        self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, bidirectional=True)
        self.hidden2line = nn.Linear(hidden_size*2, 128)
        self.line2class = nn.Linear(128, 1)
        self.relu = nn.ReLU()

    def forward(self, input_col):
        lstm_out, (hn, cn) = self.lstm(input_col.float())
        out = self.hidden2line(lstm_out[-1, :])
        out = self.relu(out)
        out = self.line2class(out)
        return out

    def backward(self, loss):
        loss.backward()

The input is a column (In LAB color space), and the output is a number that corresponds with the Y point where the shoreline is.

I am using MSE loss, and a learning rate of 0.0001, but I’ve tested with 0.001 and 0.00001 too, and the result if I need more or less epochs to get good results.

So, for example, if the point i’m looking for is on point (232, 202), then I will pass the 232 column of the image to the network and I’m expecting the 202 as output.

This is my train function:

def train(model, device, train_loader, optimizer, epoch):
    model.train()
    for images, images_points in train_loader:
        for idx, img in enumerate(images):
            for point in images_points[idx]:
                col, row = point[0].item(), point[1].item()
                if col != -1:
                    image_col = img[:, :, col]
                    data_input, target = image_col.to(device), point[1].to(device)
                    optimizer.zero_grad()
                    output = model(data_input)
                    loss = criterion(output[0], target.float())
                    model.backward(loss)
                    optimizer.step()
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                                epoch, idx + 1, len(train_loader.dataset),
                                100. * (idx + 1) / len(train_loader.dataset), loss.item()))
        print('=============================================================')

Right now I’m getting “close” results, for example after 10 epochs, batch size 16, i’m getting near 95% precision.

But, due to the naturalness of the aereal images, just a shift of 5 pixels from the real shoreline may be equivalent to some meters, so I’m working on improve the neural network so precision gets over 98%.

Can someone tell me things I can try to improve my neural network?

Thank you.