I’m benchmarking my data pipeline design. And I would like to know more information about the data pipeline at runtime.
For example, I would like to know the occupancy status of the prefetch buffer, to see if the buffer size setting is too large or small.
Or in another scenario, I want to benchmark the speed of a data pipeline but not consider the time to fill up the shuffle buffer. (Currently, I sleep the main thread for a while before trying to get data from the data iterator)
Really appreciate it if someone can shed some light on this!