Hello, as part of my final thesis I want to train a neural network for predicting the shorelines in aereal images using an LSTM. I am pretty new to PyTorch, so I am also using this project to learn from scratch.
The idea of using an LSTM is because I have a low number of samples for the dataset, so I am using the columns of the image as input of the LSTM, where the pixel labeled as shoreline is the target output for the neural network. In the end, I have a dataset of 8 images for training, but every image has near 700 columns, so I end up with 5600 columns to train the network.
I have researched about the application of LSTM networks for this use case, but I can’t find very much information about this, so I am testing some architectures and looking how they perform.
Right now, I’ve defined my neural network like this:
class BiLSTM(nn.Module):
def __init__(self, input_size, hidden_size, num_classes, num_layers):
super(BiLSTM, self).__init__()
self.hidden_size = hidden_size
self.lstm = nn.LSTM(input_size=input_size, hidden_size=hidden_size, num_layers=num_layers, bidirectional=True)
self.hidden2line = nn.Linear(hidden_size*2, 128)
self.line2class = nn.Linear(128, 1)
self.relu = nn.ReLU()
def forward(self, input_col):
lstm_out, (hn, cn) = self.lstm(input_col.float())
out = self.hidden2line(lstm_out[-1, :])
out = self.relu(out)
out = self.line2class(out)
return out
def backward(self, loss):
loss.backward()
The input is a column (In LAB color space), and the output is a number that corresponds with the Y point where the shoreline is.
I am using MSE loss, and a learning rate of 0.0001, but I’ve tested with 0.001 and 0.00001 too, and the result if I need more or less epochs to get good results.
So, for example, if the point i’m looking for is on point (232, 202), then I will pass the 232 column of the image to the network and I’m expecting the 202 as output.
This is my train function:
def train(model, device, train_loader, optimizer, epoch):
model.train()
for images, images_points in train_loader:
for idx, img in enumerate(images):
for point in images_points[idx]:
col, row = point[0].item(), point[1].item()
if col != -1:
image_col = img[:, :, col]
data_input, target = image_col.to(device), point[1].to(device)
optimizer.zero_grad()
output = model(data_input)
loss = criterion(output[0], target.float())
model.backward(loss)
optimizer.step()
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, idx + 1, len(train_loader.dataset),
100. * (idx + 1) / len(train_loader.dataset), loss.item()))
print('=============================================================')
Right now I’m getting “close” results, for example after 10 epochs, batch size 16, i’m getting near 95% precision.
But, due to the naturalness of the aereal images, just a shift of 5 pixels from the real shoreline may be equivalent to some meters, so I’m working on improve the neural network so precision gets over 98%.
Can someone tell me things I can try to improve my neural network?
Thank you.