Does ExecuTorch have an interface to load the model chunk by chunk?

dbort · October 9, 2024, 5:12am

(reposting this Meta-internal question and response)

We have a decompression engine on DSP and would like to know if it can be used on ExecuTorch models. So far we have been using BufferDataLoader by specifying buffer address and buffer size.

Is there any other interface to load the model chunk by chunk so that we can decompress the model on the fly?

You could write your own DataLoader subclass to load the data chunk by chunk, though it does require random access into the decompressed data to satisfy the load() behavior. This is usually a pain when loading from compressed data, because naive compression schemes will require re-decompressing from the beginning of the stream every time.

Block compression can avoid some of that by having sync points inside the compressed data, so that you can start decompressing from further in the stream. It typically requires extra metadata and compression logic, though.

Another question. After the model is loaded, can we reclaim the buffer memory space?

Depends on the model. The ET runtime uses the FreeableBuffer objects returned by DataLoader to manage program data lifetime. When they’re freed, the data behind that FreeableBuffer can be reclaimed. But the standard BufferDataLoader can’t free anything, because a) it doesn’t know how its memory was allocated, and b) even if it knew that the memory was allocated with malloc(), there’s no way to free a hole inside a malloc buffer (apart from OS tricks like madvise()). That was the whole motivation behind DataLoader and FreeableBuffer.

As for when those buffers are freed: today, then only time that this happens is for backend delegate data blobs, during model load. If, after initialization, the backend decides that it doesn’t need its data blob, it can free it. XNNPACK does this, though some backends do not.

Apart from those backend data segments, all other loaded data is required for model execution, and needs to remain resident for the lifetime of the Method. Although we could add some kind of API for freeing the Method data, it would be equivalent to destroying the Method itself, which is already possible today. So if you need to reclaim data between executions, you could destroy the Method – but there’s the obvious tradeoff with the latency of re-loading the Method the next time you want to execute it.