What’s the proper place for storing per-worker thread-local state, like buffer tensors? Is it dataset object (since it’s being copied over to all different processes and is supposed to not be shared)?
E.g. my __getitem__
would like to store some buffers to be reused in the next __getitem__
call in order to save some CPU NumPy allocations