Alternative to load training data from PostgreSQL DB

My current trainning data are stored in a PostgreSQL DB.
Since the volume of the data is huge, it’s impossible to preload all the data into memory before the trainning.

I implemented my own Dataset. When it being iterated, it loads features from the DB one by one.
And then I use a torch.utils.data.DataLoader to iterate it when trainning.

When I pass a low value(1~32) of the parameter num_workers to the DataLoader, the trainning process goes normally but the loading throughput is low, so that the usage of the GPU is very low.

When num_workers hit or exceed 64, the loading rate stop to increase, and I found that the main process is using 100% CPU of a core.

I guess the reason perhaps is that in the main process, DataLoader has a lot of communicating or synchronizing works with sub-processes to do. The more sub-processes, the more sync overhead, so that the overhead of receiving data from the sub-processes become a bottleneck of the main process.

I don’t know whether above analysis is right.
If so, would you pls tech me a correct way to load data from a SQL DB?
Maybe I should use a memory DB such as mongodb/redis instead? With that type of DB, I don’t need to use so many workers to load data?
Thanks!