Here is the code that I isolate to reproduce the problem.
When doing the inference, the Cpu has 100% load, and the Gpu remains with almost no load.
This is for the inference, so I provide an image that is not heavy, about 400x400px. The model has also a very low wait. Then, the consuming load is done here : image_transformed = transform(image).unsqueeze(0).to(device)
I can’t provide batches or so, so it’s a new image inspected every time… How to reduce the load of the cpu? The inference for the classifier takes only 5ms, but with the cpu at 100%, and my applications runs out of cycle for the other main tasks because of this.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print("Using device:", device)
def main():
# Load and preprocess an image for inference
image = Image.open(image_path)
###
# Define Image transformations
transform = transforms.Compose(
[transforms.Resize((224, 224)), #224 for the resNet50
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
)
])
trainset = torchvision.datasets.ImageFolder(root=PathTrain, transform=transform)
classes = trainset.classes
model = torchvision.models.squeezenet1_1(weights=True)
model.classifier[1] = nn.Conv2d(512, len(classes), kernel_size=(1, 1))
model.num_classes = len(classes)
model.load_state_dict(torch.load(PathModel))
model.eval()
model.to(device)
###
while True:
#
try:
start_time = time.time()
# Process image
image_transformed = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
output = model(image_transformed)
_, indice = torch.max(output, 1)
elapsed_time = (time.time() - start_time) * 1000
except Exception as e:
response = 'ERROR'
finally:
time.sleep(0.1)
if __name__ == "__main__":
main()