DataLoader worker thread-local state

What’s the proper place for storing per-worker thread-local state, like buffer tensors? Is it dataset object (since it’s being copied over to all different processes and is supposed to not be shared)?

E.g. my __getitem__ would like to store some buffers to be reused in the next __getitem__ call in order to save some CPU NumPy allocations

cc @VitalyFedyunin that has been looking at the DataLoader recently.

My best suggestion is to wait and use https://github.com/pytorch/pytorch/pull/35795 functionality,
Second best is to do on-demand buffers allocation