Saving and loading a model in Pytorch?

Yes, you are right! Put MyModel.eval() after loading state_dict.

Hi,

So a sort of related question but in the context of the saving the optimizer. Is saving/loading the full optimizer object, the same as saving/loading only the optimizer’s state_dict() when resuming training? Aside from the obvious that saving only the state_dict(), saves memory…

In my case, the results are correct without eval() but not with it.

I am trained Gener. Adv. Net.

Wow. I did not think of that, but this also worked for me and I have no idea why.

Hello Bixqu, how are you

To save the model to .pt file and load it please see github repository on this from below link

Simple way to save and load model in pytorch

You can also read the blog on it from below link

But sorry I have not written anything how to continue training from last epoch. I would write on this also. But i hope you will gain some insights from this repository.

With Regards
Sanpreet Singh

What is wrong with doing:

def save_ckpt(path_to_ckpt):
    from pathlib import Path
    import dill as pickle
    ## Make dir. Throw no exceptions if it already exists
    path_to_ckpt.mkdir(parents=True, exist_ok=True)
    ckpt_path_plus_path = path_to_ckpt / Path('db')

    ## Pickle args
    db['crazy_mdl'] = crazy_mdl
    with open(ckpt_path_plus_path , 'ab') as db_file:
        pickle.dump(db, db_file)


Hello,

I am facing a similar problem. I am getting mAP of 0.5 for my object detection algorithm when I evaluate during training, but when I try to evaluate the model using saved checkpoints, the mAP drops to 0.3 or lower.

I am saving the optimizer state as well along with the weights according to what I read in previous discussions. I am also, doing .eval after loading the model weights and optimizer state

However, I am not sure how to use the optimizer state during inference as there is no backward pass while evaluating the model.

Also, when I try to evaluate without model.eval() the results are better (mAP = 0.42) which I find a bit strange.

Is there any specific reason for this?
Thank you

Loading the state_dict from the optimizer would not be necessary, if you only run the inference.
The validation performance might increase, if the model is kept in training mode, since e.g. the batchnorm layers would use the batch stats instead of the running stats in case they are not a goof fit for the validation dataset. However, this might be seen as data leaking.

What is the mAP after training using model.eval() and then in a new script after restoring the model?

Okay. So is it advisable to use the model in training mode for inference or not?

I performed an experiment to understand how much the mAP deviates when evaluated later with and without .eval() compared to training, the results are as follows:

Experiments During training with .eval() without .eval()
c = 0.5 c=0.5 c=0.1 c=0.5 c=0.1
rotperson1 0.52 0.34 0.48 0.41 0.51
rotperson9 0.53 0.35 0.41 0.46 0.52

In above results the c denotes confidence_threshold used. I had saved two different checkpoints and for both the checkpoints the confidence_threshold threshold was 0.5 during training. However, when I try to evaluate the model with the exact same setup the mAP drops while it gives a little higher mAP when conf_thresh is set to 0.1 during evaluation with checkpoints.

I’m not sure I understand the issue correctly.
Are you seeing a significant model performance drop after loading the state_dict (checkpoint) or just by switching to model.eval()?
The latter case might be due to e.g. “wrong” internal stats from batchnorm layers, which might be the case if the data wasn’t properly shuffled/split and/or is sampled from different data domains.

It might depend on your use case, but I would recommend to use model.eval() during the validation loop. Otherwise you’ll update the internal batchnorm stats, which might result in a better performance on the validation and test set, but could also add a bias and the testset metric would most likely not reflect “unseen” samples anymore.

The issue is the performance drop after loading the state_dict.
Regarding switching model.eval(), I just read in one of the forum discussion that network may give a better result when used in train mode for inference, so just tested it in the experiment.

My main concern is that performance drops when I load state_dict (checkpoint) later for evaluation even when the rest of the setup is the same. Including the function used for the evaluation and evaluation dataset.

OK, in that case could you post the results for:

What is the mAP after training using model.eval() and then in a new script after restoring the model?

mAP during training using model.eval() was 0.52 , and then in the new script after restoring the model was 0.34

Thanks, could you post the model definition as well as all input shapes, please?

from __future__ import division

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np

from utils.parse_config import *
from utils.utils import build_targets, to_cpu, non_max_suppression

import matplotlib.pyplot as plt
import matplotlib.patches as patches


def create_modules(module_defs):
    """
    Constructs module list of layer blocks from module configuration in module_defs
    """
    hyperparams = module_defs.pop(0)
    output_filters = [int(hyperparams["channels"])]
    module_list = nn.ModuleList()
    for module_i, module_def in enumerate(module_defs):
        modules = nn.Sequential()

        if module_def["type"] == "convolutional":
            bn = int(module_def["batch_normalize"])
            filters = int(module_def["filters"])
            kernel_size = int(module_def["size"])
            pad = (kernel_size - 1) // 2
            modules.add_module(
                f"conv_{module_i}",
                nn.Conv2d(
                    in_channels=output_filters[-1],
                    out_channels=filters,
                    kernel_size=kernel_size,
                    stride=int(module_def["stride"]),
                    padding=pad,
                    bias=not bn,
                ),
            )
            if bn:
                modules.add_module(f"batch_norm_{module_i}", nn.BatchNorm2d(filters, momentum=0.9, eps=1e-5))
            if module_def["activation"] == "leaky":
                modules.add_module(f"leaky_{module_i}", nn.LeakyReLU(0.1))

        elif module_def["type"] == "maxpool":
            kernel_size = int(module_def["size"])
            stride = int(module_def["stride"])
            if kernel_size == 2 and stride == 1:
                modules.add_module(f"_debug_padding_{module_i}", nn.ZeroPad2d((0, 1, 0, 1)))
            maxpool = nn.MaxPool2d(kernel_size=kernel_size, stride=stride, padding=int((kernel_size - 1) // 2))
            modules.add_module(f"maxpool_{module_i}", maxpool)

        elif module_def["type"] == "upsample":
            upsample = Upsample(scale_factor=int(module_def["stride"]), mode="nearest")
            modules.add_module(f"upsample_{module_i}", upsample)

        elif module_def["type"] == "route":
            layers = [int(x) for x in module_def["layers"].split(",")]
            filters = sum([output_filters[1:][i] for i in layers])
            modules.add_module(f"route_{module_i}", EmptyLayer())

        elif module_def["type"] == "shortcut":
            filters = output_filters[1:][int(module_def["from"])]
            modules.add_module(f"shortcut_{module_i}", EmptyLayer())

        elif module_def["type"] == "yolo":
            anchor_idxs = [int(x) for x in module_def["mask"].split(",")]
            # Extract anchors
            anchors = [int(x) for x in module_def["anchors"].split(",")]
            anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
            anchors = [anchors[i] for i in anchor_idxs]
            num_classes = int(module_def["classes"])
            img_size = int(hyperparams["height"])
            # Define detection layer
            yolo_layer = YOLOLayer(anchors, num_classes, img_size)
            modules.add_module(f"yolo_{module_i}", yolo_layer)
        # Register module list and number of output filters
        module_list.append(modules)
        output_filters.append(filters)

    return hyperparams, module_list


class Upsample(nn.Module):
    """ nn.Upsample is deprecated """

    def __init__(self, scale_factor, mode="nearest"):
        super(Upsample, self).__init__()
        self.scale_factor = scale_factor
        self.mode = mode

    def forward(self, x):
        x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)
        return x


class EmptyLayer(nn.Module):
    """Placeholder for 'route' and 'shortcut' layers"""

    def __init__(self):
        super(EmptyLayer, self).__init__()


class YOLOLayer(nn.Module):
    """Detection layer"""

    def __init__(self, anchors, num_classes, img_dim=416):
        super(YOLOLayer, self).__init__()
        self.anchors = anchors
        self.num_anchors = len(anchors)
        self.num_classes = num_classes
        self.ignore_thres = 0.5
        self.mse_loss = nn.MSELoss()
        self.bce_loss = nn.BCELoss()
        self.obj_scale = 1
        self.noobj_scale = 100
        self.metrics = {}
        self.img_dim = img_dim
        self.grid_size = 0  # grid size
        self.angle_range = 180   # 180 or 360


    def rotation_loss(self,pred_angle,actual_angle):
        theta_pred = pred_angle
        theta_gt = actual_angle
        dt = theta_pred - theta_gt

        # periodic SE
        dt = torch.abs(torch.remainder(dt-np.pi/2,np.pi) - np.pi/2)

        assert (dt >= 0).all()
        loss = dt.sum()

        return loss


    def compute_grid_offsets(self, grid_size, cuda=True):
        self.grid_size = grid_size
        g = self.grid_size
        FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
        self.stride = self.img_dim / self.grid_size
        # Calculate offsets for each grid
        self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor)
        self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor)
        self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
        self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))
        self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))

    def forward(self, x, targets=None, img_dim=None):

        # Tensors for cuda support
        FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
        LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
        ByteTensor = torch.cuda.ByteTensor if x.is_cuda else torch.ByteTensor

        self.img_dim = img_dim
        num_samples = x.size(0)
        grid_size = x.size(2)

        prediction = (
            x.view(num_samples, self.num_anchors, self.num_classes + 6, grid_size, grid_size)
            .permute(0, 1, 3, 4, 2)
            .contiguous()
        )

        # Get outputs
        x = torch.sigmoid(prediction[..., 0])  # Center x
        y = torch.sigmoid(prediction[..., 1])  # Center y
        w = prediction[..., 2]  # Width
        h = prediction[..., 3]  # Height
        angle = torch.sigmoid(prediction[...,4])
        pred_conf = torch.sigmoid(prediction[..., 5])  # Conf
        pred_cls = torch.sigmoid(prediction[..., 6:])  # Cls pred.

        # If grid size does not match current we compute new offsets
        if grid_size != self.grid_size:
            self.compute_grid_offsets(grid_size, cuda=x.is_cuda)

        # Add offset and scale with anchors
        pred_boxes = FloatTensor(prediction[..., :5].shape)
        pred_boxes[..., 0] = x.data + self.grid_x
        pred_boxes[..., 1] = y.data + self.grid_y
        pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w
        pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h
        pred_boxes[..., 4] =   angle * self.angle_range - (self.angle_range / 2)

        pred_boxes_out = pred_boxes.detach().clone()
        pred_boxes_out[...,:4] = pred_boxes_out[...,:4] * self.stride

        output = torch.cat(
            (
                pred_boxes_out.view(num_samples, -1, 5) ,
                pred_conf.view(num_samples, -1, 1),
                pred_cls.view(num_samples, -1, self.num_classes),
            ),
            -1,
        )

        if targets is None:
            return output, 0
        else:
            iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tangle, tcls, tconf = build_targets(
                pred_boxes=pred_boxes,
                pred_cls=pred_cls,
                target=targets,
                anchors=self.scaled_anchors,
                ignore_thres=self.ignore_thres,
            )

            # Convert both the angles to radian for loss calculation
            tangle_mask = tangle[obj_mask] / 180 * np.pi
            if self.angle_range == 360:
                pangle_mask = angle[obj_mask] * 2 * np.pi - np.pi
            elif self.angle_range == 180:
                pangle_mask = angle[obj_mask] * np.pi - np.pi / 2

            # Loss : Mask outputs to ignore non-existing objects (except with conf. loss)
            loss_x = self.mse_loss(x[obj_mask], tx[obj_mask])
            loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])
            loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])
            loss_h = self.mse_loss(h[obj_mask], th[obj_mask])
            loss_a = self.rotation_loss(pangle_mask, tangle_mask)
            loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])
            loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])
            loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj
            loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])
            total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls + loss_a

            # Metrics
            cls_acc = 100 * class_mask[obj_mask].mean()
            conf_obj = pred_conf[obj_mask].mean()
            conf_noobj = pred_conf[noobj_mask].mean()
            conf50 = (pred_conf > 0.5).float()
            iou50 = (iou_scores > 0.5).float()
            iou75 = (iou_scores > 0.75).float()
            detected_mask = conf50 * class_mask * tconf
            precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)
            recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)
            recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)

            self.metrics = {
                "loss": to_cpu(total_loss).item(),
                "x": to_cpu(loss_x).item(),
                "y": to_cpu(loss_y).item(),
                "w": to_cpu(loss_w).item(),
                "h": to_cpu(loss_h).item(),
                "angle": to_cpu(loss_a).item(),
                "conf": to_cpu(loss_conf).item(),
                "cls": to_cpu(loss_cls).item(),
                "cls_acc": to_cpu(cls_acc).item(),
                "recall50": to_cpu(recall50).item(),
                "recall75": to_cpu(recall75).item(),
                "precision": to_cpu(precision).item(),
                "conf_obj": to_cpu(conf_obj).item(),
                "conf_noobj": to_cpu(conf_noobj).item(),
                "grid_size": grid_size,
            }

            return output, total_loss


class Darknet(nn.Module):
    """YOLOv3 object detection model"""

    def __init__(self, config_path, img_size=416):
        super(Darknet, self).__init__()
        self.module_defs = parse_model_config(config_path)
        self.hyperparams, self.module_list = create_modules(self.module_defs)
        self.yolo_layers = [layer[0] for layer in self.module_list if hasattr(layer[0], "metrics")]
        self.img_size = img_size
        self.seen = 0
        self.header_info = np.array([0, 0, 0, self.seen, 0], dtype=np.int32)

    def forward(self, x, targets=None):
        img_dim = x.shape[2]
        loss = 0
        layer_outputs, yolo_outputs = [], []
        for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
            if module_def["type"] in ["convolutional", "upsample", "maxpool"]:
                x = module(x)
            elif module_def["type"] == "route":
                x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1)
            elif module_def["type"] == "shortcut":
                layer_i = int(module_def["from"])
                x = layer_outputs[-1] + layer_outputs[layer_i]
            elif module_def["type"] == "yolo":
                x, layer_loss = module[0](x, targets, img_dim)
                loss += layer_loss
                yolo_outputs.append(x)
            layer_outputs.append(x)
        yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
        return yolo_outputs if targets is None else (loss, yolo_outputs)

    def load_darknet_weights(self, weights_path):
        """Parses and loads the weights stored in 'weights_path'"""

        # Open the weights file
        with open(weights_path, "rb") as f:
            header = np.fromfile(f, dtype=np.int32, count=5)  # First five are header values
            self.header_info = header  # Needed to write header when saving weights
            self.seen = header[3]  # number of images seen during training
            weights = np.fromfile(f, dtype=np.float32)  # The rest are weights

        # Establish cutoff for loading backbone weights
        cutoff = None
        if "darknet53.conv.74" in weights_path:
            cutoff = 75

        ptr = 0
        for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
            if i == cutoff:
                break
            if module_def["type"] == "convolutional":
                conv_layer = module[0]
                if module_def["batch_normalize"]:
                    # Load BN bias, weights, running mean and running variance
                    bn_layer = module[1]
                    num_b = bn_layer.bias.numel()  # Number of biases
                    # Bias
                    bn_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.bias)
                    bn_layer.bias.data.copy_(bn_b)
                    ptr += num_b
                    # Weight
                    bn_w = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.weight)
                    bn_layer.weight.data.copy_(bn_w)
                    ptr += num_b
                    # Running Mean
                    bn_rm = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_mean)
                    bn_layer.running_mean.data.copy_(bn_rm)
                    ptr += num_b
                    # Running Var
                    bn_rv = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_var)
                    bn_layer.running_var.data.copy_(bn_rv)
                    ptr += num_b
                else:
                    # Load conv. bias
                    num_b = conv_layer.bias.numel()
                    conv_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(conv_layer.bias)
                    conv_layer.bias.data.copy_(conv_b)
                    ptr += num_b
                # Load conv. weights
                num_w = conv_layer.weight.numel()
                conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight)
                conv_layer.weight.data.copy_(conv_w)
                ptr += num_w

    def save_darknet_weights(self, path, cutoff=-1):
        """
            @:param path    - path of the new weights file
            @:param cutoff  - save layers between 0 and cutoff (cutoff = -1 -> all are saved)
        """
        fp = open(path, "wb")
        self.header_info[3] = self.seen
        self.header_info.tofile(fp)

        # Iterate through layers
        for i, (module_def, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
            if module_def["type"] == "convolutional":
                conv_layer = module[0]
                # If batch norm, load bn first
                if module_def["batch_normalize"]:
                    bn_layer = module[1]
                    bn_layer.bias.data.cpu().numpy().tofile(fp)
                    bn_layer.weight.data.cpu().numpy().tofile(fp)
                    bn_layer.running_mean.data.cpu().numpy().tofile(fp)
                    bn_layer.running_var.data.cpu().numpy().tofile(fp)
                # Load conv bias
                else:
                    conv_layer.bias.data.cpu().numpy().tofile(fp)
                # Load conv weights
                conv_layer.weight.data.cpu().numpy().tofile(fp)

        fp.close()

The input image is resized to (416,416,3) in dataloader and the target annotations contain bounding box co-ordinates of 6 elements (class_label, cx, cy, w, h, angle).

I hope this is the correct information what you have asked for. Please let me know if I misunderstood something.

Thanks for the code snippet. Unfortunately, a lot of methods are undefined, so could you add them so that I could create a model, train it for some iterations, save and restore, so that I could debug the drop in performance after reloading the model?

Sure. However, there is an update. I was examining the code to see if any external parameters are different. It turns out that during training, I am using mean and std dev. of training dataset to normalize pixel values of the train and eval datasets but when I try to evaluate the model later using checkpoints, the new script uses the different mean and std dev (of eval dataset) and not that of training dataset which should be used instead.

After making this correction in the code, now the performance has improved greatly. The mAP during training was 0.52 and the mAP with new script when model is restored is 0.501. However, there is still a minor drop in mAP.

The additional definitions that are used from utils.py are as follows:

def to_cpu(tensor):
    return tensor.detach().cpu()


def load_classes(path):
    """
    Loads class labels at 'path'
    """
    fp = open(path, "r")
    names = fp.read().split("\n")[:-1]
    return names

def load_ms(path):
    """
    Load mean and std devation values from text file
    """
    with open(path, "r") as ms:   ### Read mean and standard deviation
        ms_values = ms.readlines()
        ms_values = [s.strip() for s in ms_values]
        mean_val = [float(s) for s in ms_values[0].split()]
        std_val = [float(s) for s in ms_values[1].split()]
    
    return mean_val, std_val

def write_ms(path, values):
    """
    Write mean and std calues to txt file
    path: To where the file should be stored
    values: list of mean and std [[float],[float]]
    """
    text_file = open(path, 'w')
    for sing in values:
        text_file.writelines( ["%f " % item for item in sing] )
        text_file.write("\n")
    text_file.close()

def weights_init_normal(m):
    classname = m.__class__.__name__
    if classname.find("Conv") != -1:
        torch.nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find("BatchNorm2d") != -1:
        torch.nn.init.normal_(m.weight.data, 1.0, 0.02)
        torch.nn.init.constant_(m.bias.data, 0.0)


def rescale_boxes(boxes, current_dim, original_shape):
    """ Rescales bounding boxes to the original shape """
    orig_h, orig_w = original_shape
    # The amount of padding that was added
    pad_x = max(orig_h - orig_w, 0) * (current_dim / max(original_shape))
    pad_y = max(orig_w - orig_h, 0) * (current_dim / max(original_shape))
    # Image height and width after padding is removed
    unpad_h = current_dim - pad_y
    unpad_w = current_dim - pad_x
    # Rescale bounding boxes to dimension of original image
    boxes[:, 0] = ((boxes[:, 0] - pad_x // 2) / unpad_w) * orig_w
    boxes[:, 1] = ((boxes[:, 1] - pad_y // 2) / unpad_h) * orig_h
    boxes[:, 2] = ((boxes[:, 2] - pad_x // 2) / unpad_w) * orig_w
    boxes[:, 3] = ((boxes[:, 3] - pad_y // 2) / unpad_h) * orig_h
    return boxes


def xywh2xyxy(x):
    y = x.new(x.shape)
    y[..., 0] = x[..., 0] - x[..., 2] / 2
    y[..., 1] = x[..., 1] - x[..., 3] / 2
    y[..., 2] = x[..., 0] + x[..., 2] / 2
    y[..., 3] = x[..., 1] + x[..., 3] / 2
    return y

def xyxy2xywh(x):
    y = x.new(x.shape)
    w, h = x[..., 2] - x[..., 0], x[..., 3] - x[..., 1]
    y[...,0] = x[...,0] + (w / 2)
    y[...,1] = x[...,1] + (h / 2)
    y[...,2] = w
    y[...,3] = h
    return y


def ap_per_class(tp, conf, pred_cls, target_cls):
    """ Compute the average precision, given the recall and precision curves.
    Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
    # Arguments
        tp:    True positives (list).
        conf:  Objectness value from 0-1 (list).
        pred_cls: Predicted object classes (list).
        target_cls: True object classes (list).
    # Returns
        The average precision as computed in py-faster-rcnn.
    """

    # Sort by objectness
    i = np.argsort(-conf)
    tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]

    # Find unique classes
    unique_classes = np.unique(target_cls)

    # Create Precision-Recall curve and compute AP for each class
    ap, p, r = [], [], []
    for c in tqdm.tqdm(unique_classes, desc="Computing AP"):
        i = pred_cls == c
        n_gt = (target_cls == c).sum()  # Number of ground truth objects
        n_p = i.sum()  # Number of predicted objects

        if n_p == 0 and n_gt == 0:
            continue
        elif n_p == 0 or n_gt == 0:
            ap.append(0)
            r.append(0)
            p.append(0)
        else:
            # Accumulate FPs and TPs
            fpc = (1 - tp[i]).cumsum()
            tpc = (tp[i]).cumsum()

            # Recall
            recall_curve = tpc / (n_gt + 1e-16)
            r.append(recall_curve[-1])

            # Precision
            precision_curve = tpc / (tpc + fpc)
            p.append(precision_curve[-1])

            # AP from recall-precision curve
            ap.append(compute_ap(recall_curve, precision_curve))

    # Compute F1 score (harmonic mean of precision and recall)
    p, r, ap = np.array(p), np.array(r), np.array(ap)
    f1 = 2 * p * r / (p + r + 1e-16)

    return p, r, ap, f1, unique_classes.astype("int32")


def compute_ap(recall, precision):
    """ Compute the average precision, given the recall and precision curves.
    Code originally from https://github.com/rbgirshick/py-faster-rcnn.

    # Arguments
        recall:    The recall curve (list).
        precision: The precision curve (list).
    # Returns
        The average precision as computed in py-faster-rcnn.
    """
    # correct AP calculation
    # first append sentinel values at the end
    mrec = np.concatenate(([0.0], recall, [1.0]))
    mpre = np.concatenate(([0.0], precision, [0.0]))

    # compute the precision envelope
    for i in range(mpre.size - 1, 0, -1):
        mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

    # to calculate area under PR curve, look for points
    # where X axis (recall) changes value
    i = np.where(mrec[1:] != mrec[:-1])[0]

    # and sum (\Delta recall) * prec
    ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap


def get_batch_statistics(outputs, targets, iou_threshold):
    """ Compute true positives, predicted scores and predicted labels per sample """
    batch_metrics = []
    for sample_i in range(len(outputs)):

        if outputs[sample_i] is None:
            continue

        output = outputs[sample_i]
        pred_boxes = output[:, :5]
        pred_scores = output[:, 5]
        pred_labels = output[:, -1]

        true_positives = np.zeros(pred_boxes.shape[0])

        annotations = targets[targets[:, 0] == sample_i][:, 1:]
        target_labels = annotations[:, 0] if len(annotations) else []
        if len(annotations):
            detected_boxes = []
            target_boxes = annotations[:, 1:]

            for pred_i, (pred_box, pred_label) in enumerate(zip(pred_boxes, pred_labels)):

                # If targets are found break
                if len(detected_boxes) == len(annotations):
                    break

                # Ignore if label is not one of the target labels
                if pred_label not in target_labels:
                    continue

                #iou, box_index = bbox_iou(pred_box.unsqueeze(0), target_boxes).max(0)     # Only checkes once, later if detection with better iou arrives will be ignored
                iou = iou_rotated(pred_box.unsqueeze(0), target_boxes)
                mask_matched = (target_labels == pred_label) & (iou >= iou_threshold) 

                iou_matched = torch.where(mask_matched, iou, torch.zeros_like(iou))
                iou_max, box_index = iou_matched.max(0)

                #if iou >= iou_threshold and box_index not in detected_boxes and pred_label == target_labels[box_index]:
                if iou_max >= iou_threshold and box_index not in detected_boxes:
                    true_positives[pred_i] = 1
                    detected_boxes += [box_index]
        batch_metrics.append([true_positives, pred_scores, pred_labels])
    return batch_metrics

def rotate_detections(x1, y1, x2, y2, angle, xyxy=True):

    FloatTensor = torch.cuda.FloatTensor if x1.is_cuda else torch.FloatTensor

    if xyxy:
        w, h = x2 - x1, y2 - y1
        x, y = x1 + w/2, y1 + h/2   
    else:
        # Get the coordinates of bounding boxes
        x, y, w, h = x1, y1, x2, y2 

    # Get co-ordinates for rotated angle
    if not x.size():

        c, s = np.cos(angle/180*np.pi), np.sin(angle/180*np.pi)
        R = np.asarray([[c, s], [-s, c]])
        pts = np.asarray([[-w/2, -h/2], [w/2, -h/2], [w/2, h/2], [-w/2, h/2]])
        rot_pts = []
        for pt in pts:
            rot_pts.append(([x, y] + pt @ R).astype(float))
        contours = FloatTensor([rot_pts[0], rot_pts[1], rot_pts[2], rot_pts[3]])
        
    else:
        contours = []

        for i in range(x.size(0)):
            c, s = np.cos(angle[i]/180*np.pi), np.sin(angle[i]/180*np.pi)
            R = np.asarray([[c, s], [-s, c]])
            pts = np.asarray([[-w[i]/2, -h[i]/2], [w[i]/2, -h[i]/2], [w[i]/2, h[i]/2], [-w[i]/2, h[i]/2]])
            rot_pts = []
            for pt in pts:
                rot_pts.append(([x[i], y[i]] + pt @ R).astype(float))
            contours += [FloatTensor([rot_pts[0], rot_pts[1], rot_pts[2], rot_pts[3]])]

    return contours


def bbox_wh_iou(wh1, wh2):
    wh2 = wh2.t()
    w1, h1 = wh1[0], wh1[1]
    w2, h2 = wh2[0], wh2[1]
    inter_area = torch.min(w1, w2) * torch.min(h1, h2)
    union_area = (w1 * h1 + 1e-16) + w2 * h2 - inter_area
    return inter_area / union_area


def bbox_iou(box1, box2, x1y1x2y2=True):
    """
    Returns the IoU of two bounding boxes
    """
    if not x1y1x2y2:
        # Transform from center and width to exact coordinates
        b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
        b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
        b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
        b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
    else:
        # Get the coordinates of bounding boxes
        b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
        b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]

    # get the corrdinates of the intersection rectangle
    inter_rect_x1 = torch.max(b1_x1, b2_x1)
    inter_rect_y1 = torch.max(b1_y1, b2_y1)
    inter_rect_x2 = torch.min(b1_x2, b2_x2)
    inter_rect_y2 = torch.min(b1_y2, b2_y2)
    # Intersection area
    inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
        inter_rect_y2 - inter_rect_y1 + 1, min=0
    )
    # Union Area
    b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
    b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)

    iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)

    return iou

def calculate_rotated(x, y, w, h, angle):
    '''
    angle: degree
    '''
    FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor

    w = w.item()
    h = h.item()
    #w, h = w.numpy(), h.numpy()
    c, s = np.cos(angle.item()/180*np.pi), np.sin(angle.item()/180*np.pi)
    R = np.asarray([[c, s], [-s, c]])
    pts = np.asarray([[-w/2, -h/2], [w/2, -h/2], [w/2, h/2], [-w/2, h/2]])
    rot_pts = []
    for pt in pts:
        rot_pts.append(([x, y] + pt @ R).astype(float))
    contours = FloatTensor([rot_pts[0], rot_pts[1], rot_pts[2], rot_pts[3]])
    
    return contours

def iou_rotated(box1, box2, x1y1x2y2=True):

    FloatTensor = torch.cuda.FloatTensor if box1.is_cuda else torch.FloatTensor

    if not x1y1x2y2:
        #Get center co-ordinates and w & h
        b1_cx, b1_cy, b1_w, b1_h = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
        b2_cx, b2_cy, b2_w, b2_h = box2[:,0], box2[:,1], box2[:,2], box2[:,3]
       
    else:
        # Transform co-ordinates to x,y,w,h
        b1_w, b1_h = box1[:,2] - box1[:,0], box1[:,3] - box1[:,1]
        b1_cx, b1_cy = box1[:,0] + b1_w / 2, box1[:,1] + b1_h / 2
        
        b2_w, b2_h = box2[:,2] - box2[:,0], box2[:,3] - box2[:,1]
        b2_cx, b2_cy = box2[:,0] + b2_w / 2, box2[:,1] + b2_h / 2
        
    
    #get angle for rotation for all bounding boxes
    angle_1 = box1[:,-1]
    angle_2 = box2[:,-1]

    if len(box1) == 1:
        iou_all = FloatTensor(box2.size(0)).fill_(0)
        for i in range(len(box2)):
            #Check if any element equals to infinity
            if box1[0,0]==np.inf  or box1[0,1]==np.inf or box1[0,2]==np.inf or box1[0,3]==np.inf \
            or box2[i,0]==np.inf or box2[i,1]==np.inf or box2[i,2]==np.inf or box2[i,3]==np.inf:
                iou = 1e-12
            else:
                rot_box1 = calculate_rotated(b1_cx[0], b1_cy[0], b1_w[0], b1_h[0], angle_1[0])
                rot_box2 = calculate_rotated(b2_cx[i], b2_cy[i], b2_w[i], b2_h[i], angle_2[i])

                b1_x1, b1_y1 = rot_box1.min(0)[0]
                b1_x2, b1_y2 = rot_box1.max(0)[0]
                b2_x1, b2_y1 = rot_box2.min(0)[0]
                b2_x2, b2_y2 = rot_box2.max(0)[0]

                # get the co-ordinates of the intersection rectangle
                inter_rect_x1 = torch.max(b1_x1, b2_x1)
                inter_rect_y1 = torch.max(b1_y1, b2_y1)
                inter_rect_x2 = torch.min(b1_x2, b2_x2)
                inter_rect_y2 = torch.min(b1_y2, b2_y2)
                # Intersection area
                inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
                    inter_rect_y2 - inter_rect_y1 + 1, min=0
                )
                # Union Area
                b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
                b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)

                iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)


                # rot_box1 = Polygon( [ rot_box1[0], rot_box1[1], rot_box1[2], rot_box1[3] ] )
                # rot_box2 = Polygon( [ rot_box2[0], rot_box2[1], rot_box2[2], rot_box2[3] ] )
                # # Intersection area
                # inter_area = rot_box1.intersection(rot_box2).area
                # # Union Area
                # union_area = rot_box1.union(rot_box2).area

                # iou = inter_area / (union_area + 1e-9)

            iou_all[i] = iou

        return iou_all

    else:
        assert(len(box1) == len(box2))
        #rotate every bbox
        iou_all = FloatTensor(box1.size(0)).fill_(0)
        for i in range(len(box1)):
            #Check if any element equals to infinity
            if box1[i,0]==np.inf or box1[i,1]==np.inf or box1[i,2]==np.inf or box1[i,3]==np.inf \
            or box2[i,0]==np.inf or box2[i,1]==np.inf or box2[i,2]==np.inf or box2[i,3]==np.inf:
                iou = 1e-12
            else:
                rot_box1 = calculate_rotated(b1_cx[i], b1_cy[i], b1_w[i], b1_h[i], angle_1[i])
                rot_box2 = calculate_rotated(b2_cx[i], b2_cy[i], b2_w[i], b2_h[i], angle_2[i])

                b1_x1, b1_y1 = rot_box1.min(0)[0]
                b1_x2, b1_y2 = rot_box1.max(0)[0]
                b2_x1, b2_y1 = rot_box2.min(0)[0]
                b2_x2, b2_y2 = rot_box2.max(0)[0]

                # get the co-ordinates of the intersection rectangle
                inter_rect_x1 = torch.max(b1_x1, b2_x1)
                inter_rect_y1 = torch.max(b1_y1, b2_y1)
                inter_rect_x2 = torch.min(b1_x2, b2_x2)
                inter_rect_y2 = torch.min(b1_y2, b2_y2)
                # Intersection area
                inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
                    inter_rect_y2 - inter_rect_y1 + 1, min=0
                )
                # Union Area
                b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
                b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)

                iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)

                # rot_box1 = Polygon( [ rot_box1[0], rot_box1[1], rot_box1[2], rot_box1[3] ] )
                # rot_box2 = Polygon( [ rot_box2[0], rot_box2[1], rot_box2[2], rot_box2[3] ] )
                # # Intersection area
                # inter_area = rot_box1.intersection(rot_box2).area
                # # Union Area
                # union_area = rot_box1.union(rot_box2).area

                # iou = inter_area / union_area
            iou_all[i] = iou

        return iou_all


def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4):
    """
    Removes detections with lower object confidence score than 'conf_thres' and performs
    Non-Maximum Suppression to further filter detections.
    Returns detections with shape:
        (x1, y1, x2, y2, object_conf, class_score, class_pred)
    """

    # From (center x, center y, width, height) to (x1, y1, x2, y2)
    prediction[..., :4] = xywh2xyxy(prediction[..., :4])
    output = [None for _ in range(len(prediction))]
    for image_i, image_pred in enumerate(prediction):
        # Filter out confidence scores below threshold
        image_pred = image_pred[image_pred[:, 5] >= conf_thres]
        # If none are remaining => process next image
        if not image_pred.size(0):
            continue
        # Object confidence times class confidence
        score = image_pred[:, 5] * image_pred[:, 6:].max(1)[0]
        # Sort by it
        image_pred = image_pred[(-score).argsort()]
        class_confs, class_preds = image_pred[:, 6:].max(1, keepdim=True)
        detections = torch.cat((image_pred[:, :6], class_confs.float(), class_preds.float()), 1)
        # Perform non-maximum suppression
        keep_boxes = []
        while detections.size(0):
            large_overlap = iou_rotated(detections[0, :5].unsqueeze(0), detections[:, :5]) > nms_thres
            label_match = detections[0, -1] == detections[:, -1]
            # Indices of boxes with lower confidence scores, large IOUs and matching labels
            invalid = large_overlap & label_match
            weights = detections[invalid, 5:6]
            # Merge overlapping bboxes by order of confidence
            detections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum()
            keep_boxes += [detections[0]]
            detections = detections[~invalid]
        if keep_boxes:
            output[image_i] = torch.stack(keep_boxes)
    
    # for o_i, out in enumerate(output):
    #     if out == None:
    #         output[o_i] = torch.zeros(1,8)
            

    return output


def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):

    ByteTensor = torch.cuda.BoolTensor if pred_boxes.is_cuda else torch.BoolTensor
    FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor

    nB = pred_boxes.size(0)
    nA = pred_boxes.size(1)
    nC = pred_cls.size(-1)
    nG = pred_boxes.size(2)
    nt = target.size(0)

    # Output tensors
    obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
    noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
    class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
    iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
    tx = FloatTensor(nB, nA, nG, nG).fill_(0)
    ty = FloatTensor(nB, nA, nG, nG).fill_(0)
    tw = FloatTensor(nB, nA, nG, nG).fill_(0)
    th = FloatTensor(nB, nA, nG, nG).fill_(0)
    tangle = FloatTensor(nB, nA, nG, nG).fill_(0)
    tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)
    target_boxes = FloatTensor(nt,5).fill_(0)

    # Convert to position relative to box
    target_boxes[:,:4] = target[:, 2:6] * nG
    target_boxes[:,4] = target[:, 6]
    gxy = target_boxes[:, :2]
    gwh = target_boxes[:, 2:4]
    gangle = target_boxes[:, 4]
    # Get anchors with best iou
    ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])
    best_ious, best_n = ious.max(0)
    # Separate target values
    b, target_labels = target[:, :2].long().t()
    gx, gy = gxy.t()
    gw, gh = gwh.t()
    gi, gj = gxy.long().t()
    # Set masks
    obj_mask[b, best_n, gj, gi] = 1
    noobj_mask[b, best_n, gj, gi] = 0

    # Set noobj mask to zero where iou exceeds ignore threshold
    for i, anchor_ious in enumerate(ious.t()):
        noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0

    # Coordinates
    tx[b, best_n, gj, gi] = gx - gx.floor()
    ty[b, best_n, gj, gi] = gy - gy.floor()
    # Width and height
    tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)
    th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
    # Angle
    tangle[b, best_n, gj, gi] = gangle
    # One-hot encoding of label
    tcls[b, best_n, gj, gi, target_labels] = 1
    # Compute label correctness and iou at best anchor
    class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()
    iou_scores[b, best_n, gj, gi] = iou_rotated(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)

    tconf = obj_mask.float()
    return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tangle, tcls, tconf

I think the best solution for this kind of need is using pytorch_lightning.
Loading weights and training: Saving and loading weights — PyTorch Lightning 1.1.5 documentation

Thanks for the suggestion. I would try that later. Actually, I am working on my master thesis, so I am kind of short on time. And if I understand correctly, I would have to convert my whole training script according to the Pytorch-lightning structure, which would take quite some time.

However, for me the problem is solved, as I can move ahead with this little discrepancy in the performance, but I was curious as to what causes the drop in performance, with the exact same setup. Would like to know if there is some valid reason.