Combine two models; right use of CrossEntropyLoss (Classification) and SmoothL1Loss (regressor)


Source | Paper

ModelA: nn.CrossEntropyLoss() → Is used for Classification.
ModelB: nn.SmoothL1Loss(reduction='mean') → Is used for regression.

I am having two questions:
1) How to combine the two models?
2) How should the distance_regressor look like together with the loss_dist_fn

Referring to 1) Combining two models

class_dist_parameters = list(classifier.parameters()) + list(distance_regressor.parameters())
optimizer = torch.optim.Adam(class_dist_parameters, lr=0.001)

Within my training_step I do everything serialized:
Like:

classifier.train()
distance_regressor.train()

#within the for_loop:
  for batch_out, (image_dl, rois_dl, class_label_dl, distance_label_dl, df_train_dl) in enumerate(dataloader):


            distance_label_dl = distance_label_dl.reshape(distance_label_dl.shape[0], 1)

            output_class_pred = classifier(rois_out)
            roi_output_distance_pred = distance_regressor(rois_out)

            output_distance_pred = softplus(roi_output_distance_pred)

            # 2. Calculate  and accumulate loss
            #size mismatch (got input: [9], target: [8]) → fixed with reshape.
            loss_class = loss_fn_class(output_class_pred, class_label_dl)
            train_loss_class += loss_class.item()

            loss_distance = loss_fn_distance(output_distance_pred), distance_label_dl)
            train_loss_distance += loss_distance.item()

            # 3. Optimizer zero grad
            optimizer.zero_grad()

            # 4. Loss backward
            #loss.backward()

            loss_class.backward()
            loss_distance.backward()

            # 5. Optimizer step
            optimizer.step()

        # Calculate and accumulate accuracy metric across all batches
        y_pred_class_no = torch.argmax(torch.softmax(output_class_pred, dim=1), dim=1)

        #######################################
        # Not sure how to continue with the distance_regressor
        #######################################
        y_pred_distance_no = output_distance_pred
        #y_pred_distance_no = torch.argmax(torch.softplus(output_distance_pred), dim=1)

        ### Classifier seems to work like this.
        train_acc_class += (y_pred_class_no == class_label_dl).sum().item()/len(output_class_pred)

        ### 
        train_rmse_acc_distance += ((torch.pow((y_pred_distance_no - distance_label_dl), 2)).sum())/len(output_distance_pred)



Anything I forgot, or special about combining the two models?

Referring to
2) How should the distance_regressor look like together with the loss_dist_fn

def build_distance_regressor(in_feature_size, first_layer_size, second_layer_size, out_feature_size):
    distance_regressor = nn.Sequential(
        nn.Flatten(1, -1),
        nn.Linear(in_features=in_feature_size, out_features=first_layer_size),
        nn.Linear(in_features=first_layer_size, out_features=second_layer_size),
        nn.Linear(in_features=second_layer_size, out_features=out_feature_size),
    )
    return distance_regressor.to(device)

with

dist_regressor= build_distance_regressor(in_feature_size=214016, first_layer_size=2048, second_layer_size=512, out_feature_size=1)

But …
Right now I don’t get any improvements, regarding to distance_regressor.
model_output is a big negative value; which means that softplus is making zeros out of it, and even after 20 Epochs; nothing changes.

classifier seems to work in single use (simple Linear-Layer).
So I am thinking about to normalize the distance_targets with min–max.

Any ideas how to get the distance_regressor working?

Maybe you have an idea? @ptrblck :smiley:

P.S. I already consulted this topic: A model with multiple outputs