Hello,
at first, please excuse my english, I am trying my best.
My aim is to create a fully convolutional net for binary classification which can handle input images of differnt sizes. At the moment I get no error running the code, but the network is not learning. The accuracy stays at around 50% and the loss does not change much.
I took the following measures to be able to use different image sizes:
- Since a dataloader can not handle variing image sizes by default, I am using a custom colate function to put all image tensors in a list instead of one big tensor:
def collate_data(batch):
data = [item[0] for item in batch]
target = [item[1] for item in batch]
target = torch.LongTensor(target)
return [data, target]
- I can not feed my model a list of tensors, so in my main loop I iterate over the list and feed each image one by one, saving the outputs of the net and storing all in a FloatTensor:
for epoch in range(start_epoch, num_epochs):
for i, (images, labels) in enumerate(train_loader, 0):
outputs = torch.Tensor()
for im in images:
im = im.unsqueeze(0)
im = im.to(device)
# Forward pass
out = model(im)
outputs = torch.cat((outputs, out))
outputs = outputs.reshape(labels.shape[0], -1)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
I also tried calculating the loss in the inner for-loop, summing and averaging it. I also tried to loss.backward() in the inner for loop and only make the opt step outside the loop. All methods produce the same results.
When I take the very same network architecture but do not use the custom colate fn and the following main loop, everything runs as expected (assuming the images have the same size):
for i, (images, labels) in enumerate(train_loader, 0):
images = images.to(device)
labels = labels.to(device)
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
outputs and labels have the same shapes in both cases before feeding the criterion function.
At the moment I only see one possible cause for the network not learning:
Autograd is not working as I expect in the inner for-loop. Maybe the creation of the new ‘outputs’ tensor does not produce a gradient, that can be backpropagated well.
Do you have any suggestions regarding this issue? Maybe I just have a big error in reasoning.
Thank you very much in advance.