How to implement dynamic or cost-aware data prefetching logic using pytorch dataloader?

I am pretty new to pytorch, but was just wondering if it is possible to implement cost-aware data prefetching in pytorch for maximum gpu utilization. I.e. not just statically populating prefetch_factor but rather making it more batch and cost aware. Any ideas or frameworks that have implemented this kind of staff?

Is it something similar to this: GitHub - Rahm-no/MinatoLoader: Artifact for EuroSys'26 paper "MinatoLoader: Accelerating Machine Learning Training Through Efficient Data Preprocessing"