In order to load data, I need to use licensed software that I need to allocate in each process (=worker) init, and that I need to free on process shutdown.
For the init, this is easy using the
worker_init_fn argument of the
DataLoader class, but for shutdown, I was not able to find a working solution. I tried implementing it in the
__del__ method of my dataset, but this does not appear to work. Any suggestion?
This would be much easier if I could use multithreading instead of multiprocessing for the DataLoader workers but this is not available yet (this appears to be in the works [RFC, Tracker] DataLoader improvements · Issue #41292 · pytorch/pytorch · GitHub).