Yes, you are right! Put MyModel.eval() after loading state_dict.
Hi,
So a sort of related question but in the context of the saving the optimizer. Is saving/loading the full optimizer object, the same as saving/loading only the optimizer’s state_dict() when resuming training? Aside from the obvious that saving only the state_dict(), saves memory…
In my case, the results are correct without eval() but not with it.
I am trained Gener. Adv. Net.
Wow. I did not think of that, but this also worked for me and I have no idea why.
Hello Bixqu, how are you
To save the model to .pt file and load it please see github repository on this from below link
Simple way to save and load model in pytorch
You can also read the blog on it from below link
But sorry I have not written anything how to continue training from last epoch. I would write on this also. But i hope you will gain some insights from this repository.
With Regards
Sanpreet Singh
What is wrong with doing:
def save_ckpt(path_to_ckpt):
from pathlib import Path
import dill as pickle
## Make dir. Throw no exceptions if it already exists
path_to_ckpt.mkdir(parents=True, exist_ok=True)
ckpt_path_plus_path = path_to_ckpt / Path('db')
## Pickle args
db['crazy_mdl'] = crazy_mdl
with open(ckpt_path_plus_path , 'ab') as db_file:
pickle.dump(db, db_file)
Hello,
I am facing a similar problem. I am getting mAP of 0.5 for my object detection algorithm when I evaluate during training, but when I try to evaluate the model using saved checkpoints, the mAP drops to 0.3 or lower.
I am saving the optimizer state as well along with the weights according to what I read in previous discussions. I am also, doing .eval
after loading the model weights and optimizer state
However, I am not sure how to use the optimizer state during inference as there is no backward pass while evaluating the model.
Also, when I try to evaluate without model.eval()
the results are better (mAP = 0.42) which I find a bit strange.
Is there any specific reason for this?
Thank you
Loading the state_dict
from the optimizer
would not be necessary, if you only run the inference.
The validation performance might increase, if the model is kept in training mode, since e.g. the batchnorm layers would use the batch stats instead of the running stats in case they are not a goof fit for the validation dataset. However, this might be seen as data leaking.
What is the mAP
after training using model.eval()
and then in a new script after restoring the model?
Okay. So is it advisable to use the model in training mode for inference or not?
I performed an experiment to understand how much the mAP
deviates when evaluated later with and without .eval()
compared to training, the results are as follows:
Experiments | During training | with .eval() | without .eval() | ||
---|---|---|---|---|---|
c = 0.5 | c=0.5 | c=0.1 | c=0.5 | c=0.1 | |
rotperson1 | 0.52 | 0.34 | 0.48 | 0.41 | 0.51 |
rotperson9 | 0.53 | 0.35 | 0.41 | 0.46 | 0.52 |
In above results the c
denotes confidence_threshold
used. I had saved two different checkpoints and for both the checkpoints the confidence_threshold
threshold was 0.5
during training. However, when I try to evaluate the model with the exact same setup the mAP
drops while it gives a little higher mAP
when conf_thresh
is set to 0.1
during evaluation with checkpoints.
I’m not sure I understand the issue correctly.
Are you seeing a significant model performance drop after loading the state_dict
(checkpoint) or just by switching to model.eval()
?
The latter case might be due to e.g. “wrong” internal stats from batchnorm layers, which might be the case if the data wasn’t properly shuffled/split and/or is sampled from different data domains.
It might depend on your use case, but I would recommend to use model.eval()
during the validation loop. Otherwise you’ll update the internal batchnorm stats, which might result in a better performance on the validation and test set, but could also add a bias and the testset metric would most likely not reflect “unseen” samples anymore.
The issue is the performance drop after loading the state_dict
.
Regarding switching model.eval()
, I just read in one of the forum discussion that network may give a better result when used in train mode for inference, so just tested it in the experiment.
My main concern is that performance drops when I load state_dict
(checkpoint) later for evaluation even when the rest of the setup is the same. Including the function used for the evaluation and evaluation dataset.
OK, in that case could you post the results for:
What is the
mAP
after training usingmodel.eval()
and then in a new script after restoring the model?
mAP
during training using model.eval()
was 0.52
, and then in the new script after restoring the model was 0.34
Thanks, could you post the model definition as well as all input shapes, please?
from __future__ import division
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np
from utils.parse_config import *
from utils.utils import build_targets, to_cpu, non_max_suppression
import matplotlib.pyplot as plt
import matplotlib.patches as patches
def create_modules(module_defs):
"""
Constructs module list of layer blocks from module configuration in module_defs
"""
hyperparams = module_defs.pop(0)
output_filters = [int(hyperparams["channels"])]
module_list = nn.ModuleList()
for module_i, module_def in enumerate(module_defs):
modules = nn.Sequential()
if module_def["type"] == "convolutional":
bn = int(module_def["batch_normalize"])
filters = int(module_def["filters"])
kernel_size = int(module_def["size"])
pad = (kernel_size - 1) // 2
modules.add_module(
f"conv_{module_i}",
nn.Conv2d(
in_channels=output_filters[-1],
out_channels=filters,
kernel_size=kernel_size,
stride=int(module_def["stride"]),
padding=pad,
bias=not bn,
),
)
if bn:
modules.add_module(f"batch_norm_{module_i}", nn.BatchNorm2d(filters, momentum=0.9, eps=1e-5))
if module_def["activation"] == "leaky":
modules.add_module(f"leaky_{module_i}", nn.LeakyReLU(0.1))
elif module_def["type"] == "maxpool":
kernel_size = int(module_def["size"])
stride = int(module_def["stride"])
if kernel_size == 2 and stride == 1:
modules.add_module(f"_debug_padding_{module_i}", nn.ZeroPad2d((0, 1, 0, 1)))
maxpool = nn.MaxPool2d(kernel_size=kernel_size, stride=stride, padding=int((kernel_size - 1) // 2))
modules.add_module(f"maxpool_{module_i}", maxpool)
elif module_def["type"] == "upsample":
upsample = Upsample(scale_factor=int(module_def["stride"]), mode="nearest")
modules.add_module(f"upsample_{module_i}", upsample)
elif module_def["type"] == "route":
layers = [int(x) for x in module_def["layers"].split(",")]
filters = sum([output_filters[1:][i] for i in layers])
modules.add_module(f"route_{module_i}", EmptyLayer())
elif module_def["type"] == "shortcut":
filters = output_filters[1:][int(module_def["from"])]
modules.add_module(f"shortcut_{module_i}", EmptyLayer())
elif module_def["type"] == "yolo":
anchor_idxs = [int(x) for x in module_def["mask"].split(",")]
# Extract anchors
anchors = [int(x) for x in module_def["anchors"].split(",")]
anchors = [(anchors[i], anchors[i + 1]) for i in range(0, len(anchors), 2)]
anchors = [anchors[i] for i in anchor_idxs]
num_classes = int(module_def["classes"])
img_size = int(hyperparams["height"])
# Define detection layer
yolo_layer = YOLOLayer(anchors, num_classes, img_size)
modules.add_module(f"yolo_{module_i}", yolo_layer)
# Register module list and number of output filters
module_list.append(modules)
output_filters.append(filters)
return hyperparams, module_list
class Upsample(nn.Module):
""" nn.Upsample is deprecated """
def __init__(self, scale_factor, mode="nearest"):
super(Upsample, self).__init__()
self.scale_factor = scale_factor
self.mode = mode
def forward(self, x):
x = F.interpolate(x, scale_factor=self.scale_factor, mode=self.mode)
return x
class EmptyLayer(nn.Module):
"""Placeholder for 'route' and 'shortcut' layers"""
def __init__(self):
super(EmptyLayer, self).__init__()
class YOLOLayer(nn.Module):
"""Detection layer"""
def __init__(self, anchors, num_classes, img_dim=416):
super(YOLOLayer, self).__init__()
self.anchors = anchors
self.num_anchors = len(anchors)
self.num_classes = num_classes
self.ignore_thres = 0.5
self.mse_loss = nn.MSELoss()
self.bce_loss = nn.BCELoss()
self.obj_scale = 1
self.noobj_scale = 100
self.metrics = {}
self.img_dim = img_dim
self.grid_size = 0 # grid size
self.angle_range = 180 # 180 or 360
def rotation_loss(self,pred_angle,actual_angle):
theta_pred = pred_angle
theta_gt = actual_angle
dt = theta_pred - theta_gt
# periodic SE
dt = torch.abs(torch.remainder(dt-np.pi/2,np.pi) - np.pi/2)
assert (dt >= 0).all()
loss = dt.sum()
return loss
def compute_grid_offsets(self, grid_size, cuda=True):
self.grid_size = grid_size
g = self.grid_size
FloatTensor = torch.cuda.FloatTensor if cuda else torch.FloatTensor
self.stride = self.img_dim / self.grid_size
# Calculate offsets for each grid
self.grid_x = torch.arange(g).repeat(g, 1).view([1, 1, g, g]).type(FloatTensor)
self.grid_y = torch.arange(g).repeat(g, 1).t().view([1, 1, g, g]).type(FloatTensor)
self.scaled_anchors = FloatTensor([(a_w / self.stride, a_h / self.stride) for a_w, a_h in self.anchors])
self.anchor_w = self.scaled_anchors[:, 0:1].view((1, self.num_anchors, 1, 1))
self.anchor_h = self.scaled_anchors[:, 1:2].view((1, self.num_anchors, 1, 1))
def forward(self, x, targets=None, img_dim=None):
# Tensors for cuda support
FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
LongTensor = torch.cuda.LongTensor if x.is_cuda else torch.LongTensor
ByteTensor = torch.cuda.ByteTensor if x.is_cuda else torch.ByteTensor
self.img_dim = img_dim
num_samples = x.size(0)
grid_size = x.size(2)
prediction = (
x.view(num_samples, self.num_anchors, self.num_classes + 6, grid_size, grid_size)
.permute(0, 1, 3, 4, 2)
.contiguous()
)
# Get outputs
x = torch.sigmoid(prediction[..., 0]) # Center x
y = torch.sigmoid(prediction[..., 1]) # Center y
w = prediction[..., 2] # Width
h = prediction[..., 3] # Height
angle = torch.sigmoid(prediction[...,4])
pred_conf = torch.sigmoid(prediction[..., 5]) # Conf
pred_cls = torch.sigmoid(prediction[..., 6:]) # Cls pred.
# If grid size does not match current we compute new offsets
if grid_size != self.grid_size:
self.compute_grid_offsets(grid_size, cuda=x.is_cuda)
# Add offset and scale with anchors
pred_boxes = FloatTensor(prediction[..., :5].shape)
pred_boxes[..., 0] = x.data + self.grid_x
pred_boxes[..., 1] = y.data + self.grid_y
pred_boxes[..., 2] = torch.exp(w.data) * self.anchor_w
pred_boxes[..., 3] = torch.exp(h.data) * self.anchor_h
pred_boxes[..., 4] = angle * self.angle_range - (self.angle_range / 2)
pred_boxes_out = pred_boxes.detach().clone()
pred_boxes_out[...,:4] = pred_boxes_out[...,:4] * self.stride
output = torch.cat(
(
pred_boxes_out.view(num_samples, -1, 5) ,
pred_conf.view(num_samples, -1, 1),
pred_cls.view(num_samples, -1, self.num_classes),
),
-1,
)
if targets is None:
return output, 0
else:
iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tangle, tcls, tconf = build_targets(
pred_boxes=pred_boxes,
pred_cls=pred_cls,
target=targets,
anchors=self.scaled_anchors,
ignore_thres=self.ignore_thres,
)
# Convert both the angles to radian for loss calculation
tangle_mask = tangle[obj_mask] / 180 * np.pi
if self.angle_range == 360:
pangle_mask = angle[obj_mask] * 2 * np.pi - np.pi
elif self.angle_range == 180:
pangle_mask = angle[obj_mask] * np.pi - np.pi / 2
# Loss : Mask outputs to ignore non-existing objects (except with conf. loss)
loss_x = self.mse_loss(x[obj_mask], tx[obj_mask])
loss_y = self.mse_loss(y[obj_mask], ty[obj_mask])
loss_w = self.mse_loss(w[obj_mask], tw[obj_mask])
loss_h = self.mse_loss(h[obj_mask], th[obj_mask])
loss_a = self.rotation_loss(pangle_mask, tangle_mask)
loss_conf_obj = self.bce_loss(pred_conf[obj_mask], tconf[obj_mask])
loss_conf_noobj = self.bce_loss(pred_conf[noobj_mask], tconf[noobj_mask])
loss_conf = self.obj_scale * loss_conf_obj + self.noobj_scale * loss_conf_noobj
loss_cls = self.bce_loss(pred_cls[obj_mask], tcls[obj_mask])
total_loss = loss_x + loss_y + loss_w + loss_h + loss_conf + loss_cls + loss_a
# Metrics
cls_acc = 100 * class_mask[obj_mask].mean()
conf_obj = pred_conf[obj_mask].mean()
conf_noobj = pred_conf[noobj_mask].mean()
conf50 = (pred_conf > 0.5).float()
iou50 = (iou_scores > 0.5).float()
iou75 = (iou_scores > 0.75).float()
detected_mask = conf50 * class_mask * tconf
precision = torch.sum(iou50 * detected_mask) / (conf50.sum() + 1e-16)
recall50 = torch.sum(iou50 * detected_mask) / (obj_mask.sum() + 1e-16)
recall75 = torch.sum(iou75 * detected_mask) / (obj_mask.sum() + 1e-16)
self.metrics = {
"loss": to_cpu(total_loss).item(),
"x": to_cpu(loss_x).item(),
"y": to_cpu(loss_y).item(),
"w": to_cpu(loss_w).item(),
"h": to_cpu(loss_h).item(),
"angle": to_cpu(loss_a).item(),
"conf": to_cpu(loss_conf).item(),
"cls": to_cpu(loss_cls).item(),
"cls_acc": to_cpu(cls_acc).item(),
"recall50": to_cpu(recall50).item(),
"recall75": to_cpu(recall75).item(),
"precision": to_cpu(precision).item(),
"conf_obj": to_cpu(conf_obj).item(),
"conf_noobj": to_cpu(conf_noobj).item(),
"grid_size": grid_size,
}
return output, total_loss
class Darknet(nn.Module):
"""YOLOv3 object detection model"""
def __init__(self, config_path, img_size=416):
super(Darknet, self).__init__()
self.module_defs = parse_model_config(config_path)
self.hyperparams, self.module_list = create_modules(self.module_defs)
self.yolo_layers = [layer[0] for layer in self.module_list if hasattr(layer[0], "metrics")]
self.img_size = img_size
self.seen = 0
self.header_info = np.array([0, 0, 0, self.seen, 0], dtype=np.int32)
def forward(self, x, targets=None):
img_dim = x.shape[2]
loss = 0
layer_outputs, yolo_outputs = [], []
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
if module_def["type"] in ["convolutional", "upsample", "maxpool"]:
x = module(x)
elif module_def["type"] == "route":
x = torch.cat([layer_outputs[int(layer_i)] for layer_i in module_def["layers"].split(",")], 1)
elif module_def["type"] == "shortcut":
layer_i = int(module_def["from"])
x = layer_outputs[-1] + layer_outputs[layer_i]
elif module_def["type"] == "yolo":
x, layer_loss = module[0](x, targets, img_dim)
loss += layer_loss
yolo_outputs.append(x)
layer_outputs.append(x)
yolo_outputs = to_cpu(torch.cat(yolo_outputs, 1))
return yolo_outputs if targets is None else (loss, yolo_outputs)
def load_darknet_weights(self, weights_path):
"""Parses and loads the weights stored in 'weights_path'"""
# Open the weights file
with open(weights_path, "rb") as f:
header = np.fromfile(f, dtype=np.int32, count=5) # First five are header values
self.header_info = header # Needed to write header when saving weights
self.seen = header[3] # number of images seen during training
weights = np.fromfile(f, dtype=np.float32) # The rest are weights
# Establish cutoff for loading backbone weights
cutoff = None
if "darknet53.conv.74" in weights_path:
cutoff = 75
ptr = 0
for i, (module_def, module) in enumerate(zip(self.module_defs, self.module_list)):
if i == cutoff:
break
if module_def["type"] == "convolutional":
conv_layer = module[0]
if module_def["batch_normalize"]:
# Load BN bias, weights, running mean and running variance
bn_layer = module[1]
num_b = bn_layer.bias.numel() # Number of biases
# Bias
bn_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.bias)
bn_layer.bias.data.copy_(bn_b)
ptr += num_b
# Weight
bn_w = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.weight)
bn_layer.weight.data.copy_(bn_w)
ptr += num_b
# Running Mean
bn_rm = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_mean)
bn_layer.running_mean.data.copy_(bn_rm)
ptr += num_b
# Running Var
bn_rv = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(bn_layer.running_var)
bn_layer.running_var.data.copy_(bn_rv)
ptr += num_b
else:
# Load conv. bias
num_b = conv_layer.bias.numel()
conv_b = torch.from_numpy(weights[ptr : ptr + num_b]).view_as(conv_layer.bias)
conv_layer.bias.data.copy_(conv_b)
ptr += num_b
# Load conv. weights
num_w = conv_layer.weight.numel()
conv_w = torch.from_numpy(weights[ptr : ptr + num_w]).view_as(conv_layer.weight)
conv_layer.weight.data.copy_(conv_w)
ptr += num_w
def save_darknet_weights(self, path, cutoff=-1):
"""
@:param path - path of the new weights file
@:param cutoff - save layers between 0 and cutoff (cutoff = -1 -> all are saved)
"""
fp = open(path, "wb")
self.header_info[3] = self.seen
self.header_info.tofile(fp)
# Iterate through layers
for i, (module_def, module) in enumerate(zip(self.module_defs[:cutoff], self.module_list[:cutoff])):
if module_def["type"] == "convolutional":
conv_layer = module[0]
# If batch norm, load bn first
if module_def["batch_normalize"]:
bn_layer = module[1]
bn_layer.bias.data.cpu().numpy().tofile(fp)
bn_layer.weight.data.cpu().numpy().tofile(fp)
bn_layer.running_mean.data.cpu().numpy().tofile(fp)
bn_layer.running_var.data.cpu().numpy().tofile(fp)
# Load conv bias
else:
conv_layer.bias.data.cpu().numpy().tofile(fp)
# Load conv weights
conv_layer.weight.data.cpu().numpy().tofile(fp)
fp.close()
The input image is resized to (416,416,3)
in dataloader and the target annotations contain bounding box co-ordinates of 6 elements (class_label, cx, cy, w, h, angle)
.
I hope this is the correct information what you have asked for. Please let me know if I misunderstood something.
Thanks for the code snippet. Unfortunately, a lot of methods are undefined, so could you add them so that I could create a model, train it for some iterations, save and restore, so that I could debug the drop in performance after reloading the model?
Sure. However, there is an update. I was examining the code to see if any external parameters are different. It turns out that during training, I am using mean
and std dev.
of training dataset to normalize pixel values of the train and eval datasets but when I try to evaluate the model later using checkpoints, the new script uses the different mean
and std dev
(of eval dataset) and not that of training dataset which should be used instead.
After making this correction in the code, now the performance has improved greatly. The mAP
during training was 0.52
and the mAP
with new script when model is restored is 0.501
. However, there is still a minor drop in mAP
.
The additional definitions that are used from utils.py
are as follows:
def to_cpu(tensor):
return tensor.detach().cpu()
def load_classes(path):
"""
Loads class labels at 'path'
"""
fp = open(path, "r")
names = fp.read().split("\n")[:-1]
return names
def load_ms(path):
"""
Load mean and std devation values from text file
"""
with open(path, "r") as ms: ### Read mean and standard deviation
ms_values = ms.readlines()
ms_values = [s.strip() for s in ms_values]
mean_val = [float(s) for s in ms_values[0].split()]
std_val = [float(s) for s in ms_values[1].split()]
return mean_val, std_val
def write_ms(path, values):
"""
Write mean and std calues to txt file
path: To where the file should be stored
values: list of mean and std [[float],[float]]
"""
text_file = open(path, 'w')
for sing in values:
text_file.writelines( ["%f " % item for item in sing] )
text_file.write("\n")
text_file.close()
def weights_init_normal(m):
classname = m.__class__.__name__
if classname.find("Conv") != -1:
torch.nn.init.normal_(m.weight.data, 0.0, 0.02)
elif classname.find("BatchNorm2d") != -1:
torch.nn.init.normal_(m.weight.data, 1.0, 0.02)
torch.nn.init.constant_(m.bias.data, 0.0)
def rescale_boxes(boxes, current_dim, original_shape):
""" Rescales bounding boxes to the original shape """
orig_h, orig_w = original_shape
# The amount of padding that was added
pad_x = max(orig_h - orig_w, 0) * (current_dim / max(original_shape))
pad_y = max(orig_w - orig_h, 0) * (current_dim / max(original_shape))
# Image height and width after padding is removed
unpad_h = current_dim - pad_y
unpad_w = current_dim - pad_x
# Rescale bounding boxes to dimension of original image
boxes[:, 0] = ((boxes[:, 0] - pad_x // 2) / unpad_w) * orig_w
boxes[:, 1] = ((boxes[:, 1] - pad_y // 2) / unpad_h) * orig_h
boxes[:, 2] = ((boxes[:, 2] - pad_x // 2) / unpad_w) * orig_w
boxes[:, 3] = ((boxes[:, 3] - pad_y // 2) / unpad_h) * orig_h
return boxes
def xywh2xyxy(x):
y = x.new(x.shape)
y[..., 0] = x[..., 0] - x[..., 2] / 2
y[..., 1] = x[..., 1] - x[..., 3] / 2
y[..., 2] = x[..., 0] + x[..., 2] / 2
y[..., 3] = x[..., 1] + x[..., 3] / 2
return y
def xyxy2xywh(x):
y = x.new(x.shape)
w, h = x[..., 2] - x[..., 0], x[..., 3] - x[..., 1]
y[...,0] = x[...,0] + (w / 2)
y[...,1] = x[...,1] + (h / 2)
y[...,2] = w
y[...,3] = h
return y
def ap_per_class(tp, conf, pred_cls, target_cls):
""" Compute the average precision, given the recall and precision curves.
Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
# Arguments
tp: True positives (list).
conf: Objectness value from 0-1 (list).
pred_cls: Predicted object classes (list).
target_cls: True object classes (list).
# Returns
The average precision as computed in py-faster-rcnn.
"""
# Sort by objectness
i = np.argsort(-conf)
tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]
# Find unique classes
unique_classes = np.unique(target_cls)
# Create Precision-Recall curve and compute AP for each class
ap, p, r = [], [], []
for c in tqdm.tqdm(unique_classes, desc="Computing AP"):
i = pred_cls == c
n_gt = (target_cls == c).sum() # Number of ground truth objects
n_p = i.sum() # Number of predicted objects
if n_p == 0 and n_gt == 0:
continue
elif n_p == 0 or n_gt == 0:
ap.append(0)
r.append(0)
p.append(0)
else:
# Accumulate FPs and TPs
fpc = (1 - tp[i]).cumsum()
tpc = (tp[i]).cumsum()
# Recall
recall_curve = tpc / (n_gt + 1e-16)
r.append(recall_curve[-1])
# Precision
precision_curve = tpc / (tpc + fpc)
p.append(precision_curve[-1])
# AP from recall-precision curve
ap.append(compute_ap(recall_curve, precision_curve))
# Compute F1 score (harmonic mean of precision and recall)
p, r, ap = np.array(p), np.array(r), np.array(ap)
f1 = 2 * p * r / (p + r + 1e-16)
return p, r, ap, f1, unique_classes.astype("int32")
def compute_ap(recall, precision):
""" Compute the average precision, given the recall and precision curves.
Code originally from https://github.com/rbgirshick/py-faster-rcnn.
# Arguments
recall: The recall curve (list).
precision: The precision curve (list).
# Returns
The average precision as computed in py-faster-rcnn.
"""
# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.0], recall, [1.0]))
mpre = np.concatenate(([0.0], precision, [0.0]))
# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
def get_batch_statistics(outputs, targets, iou_threshold):
""" Compute true positives, predicted scores and predicted labels per sample """
batch_metrics = []
for sample_i in range(len(outputs)):
if outputs[sample_i] is None:
continue
output = outputs[sample_i]
pred_boxes = output[:, :5]
pred_scores = output[:, 5]
pred_labels = output[:, -1]
true_positives = np.zeros(pred_boxes.shape[0])
annotations = targets[targets[:, 0] == sample_i][:, 1:]
target_labels = annotations[:, 0] if len(annotations) else []
if len(annotations):
detected_boxes = []
target_boxes = annotations[:, 1:]
for pred_i, (pred_box, pred_label) in enumerate(zip(pred_boxes, pred_labels)):
# If targets are found break
if len(detected_boxes) == len(annotations):
break
# Ignore if label is not one of the target labels
if pred_label not in target_labels:
continue
#iou, box_index = bbox_iou(pred_box.unsqueeze(0), target_boxes).max(0) # Only checkes once, later if detection with better iou arrives will be ignored
iou = iou_rotated(pred_box.unsqueeze(0), target_boxes)
mask_matched = (target_labels == pred_label) & (iou >= iou_threshold)
iou_matched = torch.where(mask_matched, iou, torch.zeros_like(iou))
iou_max, box_index = iou_matched.max(0)
#if iou >= iou_threshold and box_index not in detected_boxes and pred_label == target_labels[box_index]:
if iou_max >= iou_threshold and box_index not in detected_boxes:
true_positives[pred_i] = 1
detected_boxes += [box_index]
batch_metrics.append([true_positives, pred_scores, pred_labels])
return batch_metrics
def rotate_detections(x1, y1, x2, y2, angle, xyxy=True):
FloatTensor = torch.cuda.FloatTensor if x1.is_cuda else torch.FloatTensor
if xyxy:
w, h = x2 - x1, y2 - y1
x, y = x1 + w/2, y1 + h/2
else:
# Get the coordinates of bounding boxes
x, y, w, h = x1, y1, x2, y2
# Get co-ordinates for rotated angle
if not x.size():
c, s = np.cos(angle/180*np.pi), np.sin(angle/180*np.pi)
R = np.asarray([[c, s], [-s, c]])
pts = np.asarray([[-w/2, -h/2], [w/2, -h/2], [w/2, h/2], [-w/2, h/2]])
rot_pts = []
for pt in pts:
rot_pts.append(([x, y] + pt @ R).astype(float))
contours = FloatTensor([rot_pts[0], rot_pts[1], rot_pts[2], rot_pts[3]])
else:
contours = []
for i in range(x.size(0)):
c, s = np.cos(angle[i]/180*np.pi), np.sin(angle[i]/180*np.pi)
R = np.asarray([[c, s], [-s, c]])
pts = np.asarray([[-w[i]/2, -h[i]/2], [w[i]/2, -h[i]/2], [w[i]/2, h[i]/2], [-w[i]/2, h[i]/2]])
rot_pts = []
for pt in pts:
rot_pts.append(([x[i], y[i]] + pt @ R).astype(float))
contours += [FloatTensor([rot_pts[0], rot_pts[1], rot_pts[2], rot_pts[3]])]
return contours
def bbox_wh_iou(wh1, wh2):
wh2 = wh2.t()
w1, h1 = wh1[0], wh1[1]
w2, h2 = wh2[0], wh2[1]
inter_area = torch.min(w1, w2) * torch.min(h1, h2)
union_area = (w1 * h1 + 1e-16) + w2 * h2 - inter_area
return inter_area / union_area
def bbox_iou(box1, box2, x1y1x2y2=True):
"""
Returns the IoU of two bounding boxes
"""
if not x1y1x2y2:
# Transform from center and width to exact coordinates
b1_x1, b1_x2 = box1[:, 0] - box1[:, 2] / 2, box1[:, 0] + box1[:, 2] / 2
b1_y1, b1_y2 = box1[:, 1] - box1[:, 3] / 2, box1[:, 1] + box1[:, 3] / 2
b2_x1, b2_x2 = box2[:, 0] - box2[:, 2] / 2, box2[:, 0] + box2[:, 2] / 2
b2_y1, b2_y2 = box2[:, 1] - box2[:, 3] / 2, box2[:, 1] + box2[:, 3] / 2
else:
# Get the coordinates of bounding boxes
b1_x1, b1_y1, b1_x2, b1_y2 = box1[:, 0], box1[:, 1], box1[:, 2], box1[:, 3]
b2_x1, b2_y1, b2_x2, b2_y2 = box2[:, 0], box2[:, 1], box2[:, 2], box2[:, 3]
# get the corrdinates of the intersection rectangle
inter_rect_x1 = torch.max(b1_x1, b2_x1)
inter_rect_y1 = torch.max(b1_y1, b2_y1)
inter_rect_x2 = torch.min(b1_x2, b2_x2)
inter_rect_y2 = torch.min(b1_y2, b2_y2)
# Intersection area
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
inter_rect_y2 - inter_rect_y1 + 1, min=0
)
# Union Area
b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)
return iou
def calculate_rotated(x, y, w, h, angle):
'''
angle: degree
'''
FloatTensor = torch.cuda.FloatTensor if x.is_cuda else torch.FloatTensor
w = w.item()
h = h.item()
#w, h = w.numpy(), h.numpy()
c, s = np.cos(angle.item()/180*np.pi), np.sin(angle.item()/180*np.pi)
R = np.asarray([[c, s], [-s, c]])
pts = np.asarray([[-w/2, -h/2], [w/2, -h/2], [w/2, h/2], [-w/2, h/2]])
rot_pts = []
for pt in pts:
rot_pts.append(([x, y] + pt @ R).astype(float))
contours = FloatTensor([rot_pts[0], rot_pts[1], rot_pts[2], rot_pts[3]])
return contours
def iou_rotated(box1, box2, x1y1x2y2=True):
FloatTensor = torch.cuda.FloatTensor if box1.is_cuda else torch.FloatTensor
if not x1y1x2y2:
#Get center co-ordinates and w & h
b1_cx, b1_cy, b1_w, b1_h = box1[:,0], box1[:,1], box1[:,2], box1[:,3]
b2_cx, b2_cy, b2_w, b2_h = box2[:,0], box2[:,1], box2[:,2], box2[:,3]
else:
# Transform co-ordinates to x,y,w,h
b1_w, b1_h = box1[:,2] - box1[:,0], box1[:,3] - box1[:,1]
b1_cx, b1_cy = box1[:,0] + b1_w / 2, box1[:,1] + b1_h / 2
b2_w, b2_h = box2[:,2] - box2[:,0], box2[:,3] - box2[:,1]
b2_cx, b2_cy = box2[:,0] + b2_w / 2, box2[:,1] + b2_h / 2
#get angle for rotation for all bounding boxes
angle_1 = box1[:,-1]
angle_2 = box2[:,-1]
if len(box1) == 1:
iou_all = FloatTensor(box2.size(0)).fill_(0)
for i in range(len(box2)):
#Check if any element equals to infinity
if box1[0,0]==np.inf or box1[0,1]==np.inf or box1[0,2]==np.inf or box1[0,3]==np.inf \
or box2[i,0]==np.inf or box2[i,1]==np.inf or box2[i,2]==np.inf or box2[i,3]==np.inf:
iou = 1e-12
else:
rot_box1 = calculate_rotated(b1_cx[0], b1_cy[0], b1_w[0], b1_h[0], angle_1[0])
rot_box2 = calculate_rotated(b2_cx[i], b2_cy[i], b2_w[i], b2_h[i], angle_2[i])
b1_x1, b1_y1 = rot_box1.min(0)[0]
b1_x2, b1_y2 = rot_box1.max(0)[0]
b2_x1, b2_y1 = rot_box2.min(0)[0]
b2_x2, b2_y2 = rot_box2.max(0)[0]
# get the co-ordinates of the intersection rectangle
inter_rect_x1 = torch.max(b1_x1, b2_x1)
inter_rect_y1 = torch.max(b1_y1, b2_y1)
inter_rect_x2 = torch.min(b1_x2, b2_x2)
inter_rect_y2 = torch.min(b1_y2, b2_y2)
# Intersection area
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
inter_rect_y2 - inter_rect_y1 + 1, min=0
)
# Union Area
b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)
# rot_box1 = Polygon( [ rot_box1[0], rot_box1[1], rot_box1[2], rot_box1[3] ] )
# rot_box2 = Polygon( [ rot_box2[0], rot_box2[1], rot_box2[2], rot_box2[3] ] )
# # Intersection area
# inter_area = rot_box1.intersection(rot_box2).area
# # Union Area
# union_area = rot_box1.union(rot_box2).area
# iou = inter_area / (union_area + 1e-9)
iou_all[i] = iou
return iou_all
else:
assert(len(box1) == len(box2))
#rotate every bbox
iou_all = FloatTensor(box1.size(0)).fill_(0)
for i in range(len(box1)):
#Check if any element equals to infinity
if box1[i,0]==np.inf or box1[i,1]==np.inf or box1[i,2]==np.inf or box1[i,3]==np.inf \
or box2[i,0]==np.inf or box2[i,1]==np.inf or box2[i,2]==np.inf or box2[i,3]==np.inf:
iou = 1e-12
else:
rot_box1 = calculate_rotated(b1_cx[i], b1_cy[i], b1_w[i], b1_h[i], angle_1[i])
rot_box2 = calculate_rotated(b2_cx[i], b2_cy[i], b2_w[i], b2_h[i], angle_2[i])
b1_x1, b1_y1 = rot_box1.min(0)[0]
b1_x2, b1_y2 = rot_box1.max(0)[0]
b2_x1, b2_y1 = rot_box2.min(0)[0]
b2_x2, b2_y2 = rot_box2.max(0)[0]
# get the co-ordinates of the intersection rectangle
inter_rect_x1 = torch.max(b1_x1, b2_x1)
inter_rect_y1 = torch.max(b1_y1, b2_y1)
inter_rect_x2 = torch.min(b1_x2, b2_x2)
inter_rect_y2 = torch.min(b1_y2, b2_y2)
# Intersection area
inter_area = torch.clamp(inter_rect_x2 - inter_rect_x1 + 1, min=0) * torch.clamp(
inter_rect_y2 - inter_rect_y1 + 1, min=0
)
# Union Area
b1_area = (b1_x2 - b1_x1 + 1) * (b1_y2 - b1_y1 + 1)
b2_area = (b2_x2 - b2_x1 + 1) * (b2_y2 - b2_y1 + 1)
iou = inter_area / (b1_area + b2_area - inter_area + 1e-16)
# rot_box1 = Polygon( [ rot_box1[0], rot_box1[1], rot_box1[2], rot_box1[3] ] )
# rot_box2 = Polygon( [ rot_box2[0], rot_box2[1], rot_box2[2], rot_box2[3] ] )
# # Intersection area
# inter_area = rot_box1.intersection(rot_box2).area
# # Union Area
# union_area = rot_box1.union(rot_box2).area
# iou = inter_area / union_area
iou_all[i] = iou
return iou_all
def non_max_suppression(prediction, conf_thres=0.5, nms_thres=0.4):
"""
Removes detections with lower object confidence score than 'conf_thres' and performs
Non-Maximum Suppression to further filter detections.
Returns detections with shape:
(x1, y1, x2, y2, object_conf, class_score, class_pred)
"""
# From (center x, center y, width, height) to (x1, y1, x2, y2)
prediction[..., :4] = xywh2xyxy(prediction[..., :4])
output = [None for _ in range(len(prediction))]
for image_i, image_pred in enumerate(prediction):
# Filter out confidence scores below threshold
image_pred = image_pred[image_pred[:, 5] >= conf_thres]
# If none are remaining => process next image
if not image_pred.size(0):
continue
# Object confidence times class confidence
score = image_pred[:, 5] * image_pred[:, 6:].max(1)[0]
# Sort by it
image_pred = image_pred[(-score).argsort()]
class_confs, class_preds = image_pred[:, 6:].max(1, keepdim=True)
detections = torch.cat((image_pred[:, :6], class_confs.float(), class_preds.float()), 1)
# Perform non-maximum suppression
keep_boxes = []
while detections.size(0):
large_overlap = iou_rotated(detections[0, :5].unsqueeze(0), detections[:, :5]) > nms_thres
label_match = detections[0, -1] == detections[:, -1]
# Indices of boxes with lower confidence scores, large IOUs and matching labels
invalid = large_overlap & label_match
weights = detections[invalid, 5:6]
# Merge overlapping bboxes by order of confidence
detections[0, :4] = (weights * detections[invalid, :4]).sum(0) / weights.sum()
keep_boxes += [detections[0]]
detections = detections[~invalid]
if keep_boxes:
output[image_i] = torch.stack(keep_boxes)
# for o_i, out in enumerate(output):
# if out == None:
# output[o_i] = torch.zeros(1,8)
return output
def build_targets(pred_boxes, pred_cls, target, anchors, ignore_thres):
ByteTensor = torch.cuda.BoolTensor if pred_boxes.is_cuda else torch.BoolTensor
FloatTensor = torch.cuda.FloatTensor if pred_boxes.is_cuda else torch.FloatTensor
nB = pred_boxes.size(0)
nA = pred_boxes.size(1)
nC = pred_cls.size(-1)
nG = pred_boxes.size(2)
nt = target.size(0)
# Output tensors
obj_mask = ByteTensor(nB, nA, nG, nG).fill_(0)
noobj_mask = ByteTensor(nB, nA, nG, nG).fill_(1)
class_mask = FloatTensor(nB, nA, nG, nG).fill_(0)
iou_scores = FloatTensor(nB, nA, nG, nG).fill_(0)
tx = FloatTensor(nB, nA, nG, nG).fill_(0)
ty = FloatTensor(nB, nA, nG, nG).fill_(0)
tw = FloatTensor(nB, nA, nG, nG).fill_(0)
th = FloatTensor(nB, nA, nG, nG).fill_(0)
tangle = FloatTensor(nB, nA, nG, nG).fill_(0)
tcls = FloatTensor(nB, nA, nG, nG, nC).fill_(0)
target_boxes = FloatTensor(nt,5).fill_(0)
# Convert to position relative to box
target_boxes[:,:4] = target[:, 2:6] * nG
target_boxes[:,4] = target[:, 6]
gxy = target_boxes[:, :2]
gwh = target_boxes[:, 2:4]
gangle = target_boxes[:, 4]
# Get anchors with best iou
ious = torch.stack([bbox_wh_iou(anchor, gwh) for anchor in anchors])
best_ious, best_n = ious.max(0)
# Separate target values
b, target_labels = target[:, :2].long().t()
gx, gy = gxy.t()
gw, gh = gwh.t()
gi, gj = gxy.long().t()
# Set masks
obj_mask[b, best_n, gj, gi] = 1
noobj_mask[b, best_n, gj, gi] = 0
# Set noobj mask to zero where iou exceeds ignore threshold
for i, anchor_ious in enumerate(ious.t()):
noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0
# Coordinates
tx[b, best_n, gj, gi] = gx - gx.floor()
ty[b, best_n, gj, gi] = gy - gy.floor()
# Width and height
tw[b, best_n, gj, gi] = torch.log(gw / anchors[best_n][:, 0] + 1e-16)
th[b, best_n, gj, gi] = torch.log(gh / anchors[best_n][:, 1] + 1e-16)
# Angle
tangle[b, best_n, gj, gi] = gangle
# One-hot encoding of label
tcls[b, best_n, gj, gi, target_labels] = 1
# Compute label correctness and iou at best anchor
class_mask[b, best_n, gj, gi] = (pred_cls[b, best_n, gj, gi].argmax(-1) == target_labels).float()
iou_scores[b, best_n, gj, gi] = iou_rotated(pred_boxes[b, best_n, gj, gi], target_boxes, x1y1x2y2=False)
tconf = obj_mask.float()
return iou_scores, class_mask, obj_mask, noobj_mask, tx, ty, tw, th, tangle, tcls, tconf
I think the best solution for this kind of need is using pytorch_lightning.
Loading weights and training: Saving and loading weights — PyTorch Lightning 1.1.5 documentation
Thanks for the suggestion. I would try that later. Actually, I am working on my master thesis, so I am kind of short on time. And if I understand correctly, I would have to convert my whole training script according to the Pytorch-lightning structure, which would take quite some time.
However, for me the problem is solved, as I can move ahead with this little discrepancy in the performance, but I was curious as to what causes the drop in performance, with the exact same setup. Would like to know if there is some valid reason.