Hello,
I’m fairly new to pytorch and ML in general, I have been able to successfully retrain
a pretrained and modified torchvision.models.detection.fasterrcnn_resnet50_fpn_v2 model using around 13,000 training images and 4000 validation images.
The application is medical detection of muscle groups from images. It is important for this application that IoU is accurate because this will be used in a machine vision image processing pipeline.
The idea is to export the the trained model to ONNX and execute it on an embedded device based on an intel 4-core CPU.
The model result is generally very good, and I achieved a final IoU of around 0.92 on my validation set during training.
When running the model from pytorch and ONNX on images not in training or validation data, the accuracy is as expected.
The problem is the inference time is extremely slow, around 16 seconds or longer for a single image when using ORT, and is much slower than that when using pytorch to run the model.
The model was trained on a Nvidia 3060 GPU with 8Gb, and inference is performed on a CPU. Will that have an effect?
The modifications that were made to the model for training on my images are shown below:
# Load the pre-trained Faster R-CNN model with ResNet-50-FPN backbone
model = torchvision.models.detection.fasterrcnn_resnet50_fpn_v2(weights=torchvision.models.detection.FasterRCNN_ResNet50_FPN_V2_Weights.DEFAULT)
# Modify the model architecture
num_classes = 2 # Number of classes including the background class
# Get the number of input features for the box predictor
in_features = model.roi_heads.box_predictor.cls_score.in_features
# Replace the box predictor with your own
model.roi_heads.box_predictor = torchvision.models.detection.faster_rcnn.FastRCNNPredictor(in_features, num_classes)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# Define the optimizer
optimizer = optim.Adam(model.parameters(), lr=learning_rate, betas=betas, weight_decay=weight_decay)
# the scheduler will reduce the learning rate depending on validation loss
scheduler = ReduceLROnPlateau(optimizer, mode='min', factor=learning_rate_step, patience=3, verbose=True)
....
#training complete
# Save the trained model
torch.save(model.state_dict(), "Detector.pth")
# Export the model to ONNX format
dummy_input = torch.randn(1, 3, 720, 576).to(device) # Create a dummy input
onnx.export(model, dummy_input, "Detector.onnx", opset_version=14)
I have several ideas of what might be done to try and improve the model. For instance I could retrain the entire model from scratch rather than using initial pretrained weights. Retraining the model takes about 7 hours on my GPU.
I could use quantization aware training
I could try to use ONNX quantization on the model.
I don’t know whether torch quantization will work for ONNX export, I have read that it is not supported.
I could resize my images to 300x300, but I would rather they remain at their size if possible of 720x576x3 so adequate detail is maintained.
Does anyone know what may be done to achieve lower inference times? Even 1 second inference time per image would be acceptable for this application.
Thank you