I am trying to train a Unet model to do per pixel regression predictions on images. To do this, I separate my large image (1000x1000) to 200x200 pixel squares. Then use that to train an FCN model with a linear final layer. The loss function is MSE loss. In the prediction stage, I extract the same boxes but stitch it together and obtain a final output image. When I do that, the problem I am getting is that there is discontinuities between the boundary of boxes. (I can clearly see the boxes)
I’ve tried to deal with this by feeding 250x250 boxes to my FCN and calculating the loss for the 200x200 centre region. I do the same process for the prediction state. Extract 250x250 patches crop the 200x200 centre region and stitch the image back together. Please see some code below:
criterion = nn.MSELoss() optimizer = optim.Adam(self.model.parameters(), lr=LR) for inputs, labels in train_loader: inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() output = model(inputs) output = output.squeeze() _, dimx, dimy = output.shape loss = criterion(output[:,25:dimx-25, 25:dimy-25], labels[:,25:dimx-25, 25:dimy-25]) loss.backward() optimizer.step()
My code for predictions is as follows:
pred = np.zeros((height, width)) for i in range(25, height, 200): for j in range(25, width, 200): patch = img[:, i-25:i+225, j-25:j+225] patch = torch.from_numpy(patch) patch = patch.unsqueeze(dim=0).to(device) out = model(patch) out = out[0,0,25:225, 25:225] pred[i:i+200, j:j+200] = out.cpu().numpy()
I’m not sure if my problem makes complete sense. I can provide more clarification if necessary but I have been stuck on this for a while now.