Probably a newbie mistake but I’ve run out of ideas, so hoping someone can point me in the right direction?
My goal is for the model to predicate the X Y co-ordinates on an image based on features in that image. The issue is the output are converging to the same co-ordinates for all images, and getting stuck.
I have managed to overfit on the training data using the Adam and SGD optimizer; so I know its learning something. I have tried different models e.g. Resnet18, and created my own Conv2d to simplify, but ran into the same problem.
The current model is a Resent34 using a AdaBound optimizer. I have about 500 images and have augmented them to create about 3500 all scaled down and greyed (typical 20% validation)
model_ft = models.resnet34(pretrained=False)
num_of_channels = 1
model_ft.conv1 = nn.Conv2d(num_of_channels, 64, kernel_size=3, stride=1, padding=0,bias=False)
num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(512, 2)
model_ft = model_ft.to(device)
As you can see I changed the output to only two features one for X and another Y. The AdaBound Optimizer is:
optimizer_ft = optim2.AdaBound(
betas= (0.9, 0.999),
final_lr = 0.1,
I have written my own Loss function, which works out the MSE Loss of the distance between actual and predicated co-ordinates:
def loss_function(outputs, actuals):
total = 0
for i in range(0, len(outputs)):
x = (outputs[i] - actuals[i])
y = (outputs[i] - actuals[i])
total += torch.sqrt((x2 + y2))
mean = 1.0/len(outputs) * total
And the main training loop is as follows:
for i in range(0, length, BATCH_SIZE):
if phase == ‘train’:
inputs = training_batch[i:i+BATCH_SIZE].view(-1,1, scaled_h,scaled_w )
actuals = training_label[i:i+BATCH_SIZE]
inputs = validation_batch[i:i+BATCH_SIZE].view(-1,1, scaled_h,scaled_w )
actuals = validation_label[i:i+BATCH_SIZE]
inputs = inputs.to(device)
actuals = actuals.to(device)
# zero the parameter gradients
with torch.set_grad_enabled(phase == 'train'): outputs = model(inputs) loss = loss_function(outputs, actuals) # backward + optimize only if in training phase if phase == 'train': loss.backward() optimizer.step()
Here is a sample output of the a single batch after it’s trained for about 20 epochs (as you can see all the numbers are the same):
[153.2917, 93.2165]], device=‘cuda:0’, grad_fn=)
[282., 36.]], device=‘cuda:0’)
The Loss is: tensor(64.1859, device=‘cuda:0’, grad_fn=)
Right, hopefully above has given enough insight/clues on what I’m not understanding.
I have sneaky feeling that I don’t have enough features on the image to get the accuracy I want.