I have a lot of data saved on PersistentTensorDicts (PTDs), which I send to the GPU for processing.
However, there is a lot of idle time between sending data (which is read from disc) to the GPU and writing it back.
I have been looking for a way to send batches of data from the PTDs to the GPU, while the GPU is processing the previous batch, so I can reduce the idle time.
I have found this distributed example, which I thought I could make some workers for processing and others to send/read data, but I am failing to replicate the instructions there.
Does anyone have any indication or idea of which direction I should follow to solve this problem? Maybe a link with an example.
Best,