I am currently in the process of setting up model monitoring for models served with torchserve
on Kubernetes. Ideally, I would like to store input and output images for later manual prediction inspection.
I was wondering what would be the best way to achieve such a setup in a custom handler:
- Dump the preprocessd image and the model output every now and then in the handlers’
inference
method (not to impact inference performances too heavily with such an expensive operation). In this scenario the consumer should provide aprediction_id
so that the request data can later be reconciled with any user feedback or user generated event.
def inference(self, input):
img, pred_id = input
output = self.model(input)
# Only log 20% of requests (example)
if random.random() < 0.2:
dump_to_gcs(pred_id, input)
dump_to_gcs(pred_id, output)
return output
- Let the conusmer log inputs/ouputs as part of the request handling. Couple pros/cons are:
- pro: being done outside the infrerence container, there will be no impact on performances.
- con: raw input would be stored - this means that for models with heavy preprocessing before inference, the same transformations should be applied to the stored input data before being usable (especially relevant for models based on tabular data).
As expected, the usual array of checks, validations and alerts will be built upon the collected data to inform on retrain needs and different kind of drift.
I was wondering what the shared approach would look like in this case and what kind of gotchas one should be mindful of.