Hi. I have a question if could somebody find some mistakes in my training and validation loops. For me everything looks fine, but after second iteration (after first epoch looks normal), during the validation every image from validation dataset is predicted as an element of the same class (let’s say that I am trying to classify to one of four classes, so everything is predicted as 1 or f.e. 4).
It’s based on RCNN model.
My training function:
def train(train_loader, model, optimizer, epoch, device): model.train() loss_monitor = AverageMeter() with tqdm(train_loader) as _tqdm: for x, y in _tqdm: x = x.to(device) y = y.to(device) outputs = model(x, y) loss = outputs["loss_classifier"] optimizer.zero_grad() (outputs["loss_classifier"]).backward() optimizer.step() return loss # I know it's unnecessary
Validation function:
def validate(val_loader, model, epoch, device): model.eval() preds = [] gt = [] with torch.no_grad(): with tqdm(val_loader) as _tqdm: for x, y in _tqdm: x = x.to(device) y = y.to(device) gt.append(y["class"].cpu().numpy()) outputs = model(x, y) for output in outputs: pred = F.softmax(output["age"], dim=-1).cpu().numpy() pred = (pred * np.arange(0, pred.size)).sum(axis=-1) preds.append(np.array([pred])) _tqdm.set_postfix(OrderedDict(stage="val", epoch=epoch),) mae = calculate_mae(gt, preds) # function to calculate mae (classes are inindependance [class 1 is closer to 2 then to 3]) f1 = calculate_f1(gt, preds) # function to calculate mae return mae, f1
Main loop:
model = PornRCNN.create_resnet_50() optimizer = torch.optim.Adam(model.parameters(), lr=0.0001) model = model.to(device) model.set_age_loss_fn(loss_classifier) scheduler = StepLR( optimizer, step_size=0.0001, gamma=0.2, last_epoch=start_epoch - 1, ) best_val_f1 = 0 for epoch in range(start_epoch, num_epoch): train_loss = train(train_loader, model, optimizer, epoch, device) mae, f1 = validate(val_loader, model, epoch, device) if f1 > best_val_f1: model_state_dict = model.state_dict() best_val_f1 = f1 scheduler.step()
Any ideas why it works like I said? Do you have a tips how to do it better?
I should add that the loss in training mode decrease normally, so that’s not a problem.