Hi,
I’m trying to solve a classification problem with 2 classes for a biomedical application using images.
After reading a lot in this forum I’ve seen one option is using BCEWithLogitsLoss and one single output neuron. I have class imbalance, so I am using pos_weight parameter, although I don’t know if correctly.
I have some questions…
- Using Adam optimizer is a good idea? Or should I use SGD?
optimizer_ft = optim.Adam(params_to_update, hparams['learning_rate'])
num_positives = torch.tensor(sum(labels == 1), dtype=float) # 250
num_negatives = torch.tensor(len(labels) - num_positives, dtype=float) # 604
pos_weight = (num_negatives / num_positives) # around 2.4
criterion = nn.BCEWithLogitsLoss(pos_weight=pos_weight).to(get_device())
This is my train function:
2) Is it correct?
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch + 1, num_epochs))
print('-' * 10)
for phase in ['train', 'val']:
if phase == 'train':
model.train()
else:
model.eval()
running_loss = 0.0
running_corrects = 0
for inputs, labels_raw, _, _ in dataloaders[phase]:
inputs = inputs.float()
labels = labels_raw.float()
inputs = inputs.to(device)
labels = labels.to(device)
optimizer.zero_grad()
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
loss = criterion(outputs, labels.unsqueeze(1))
_, preds = torch.max(outputs, dim=1)
if phase == 'train':
loss.backward()
optimizer.step()
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == torch.argmax(labels)).item()
epoch_loss = running_loss / len(dataloaders[phase].dataset)
epoch_acc = running_corrects / len(dataloaders[phase].dataset)
if phase == 'val' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
if phase == 'val':
val_acc_history.append(epoch_acc)
if phase == 'train':
train_loss.append(epoch_loss)
train_acc.append(epoch_acc)
else:
val_loss.append(epoch_loss)
val_acc.append(epoch_acc)
print('{} loss: {:.4f}, Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
This are always my predictions:
tensor([0, 0, 0, 0, 0, 0, 0, 0], device='cuda:0')
This is a sample target for one batch of 8:
tensor([0., 0., 0., 1., 0., 0., 0., 1.], dtype=torch.float64)
There is definitely something wrong with my code but I am unable to find it. Some help will be appreciated.
I am also normalizing my images computing mean and std of the whole dataset for each channel and then normalizing each data split.
3) Should I normalize the validation/test split with the same parameters?
Thanks