It turns out my server was full-occupied when I was running the PyTorch version of my network. It now runs ~100 examples/sec with ~80% GPU utilization. The speed may improve further when the server has more free computation power.
Original answer
The same problem appeared in my case too. But instead of reading from imdb, I am reading the original JPEG file. GPU utilization is constantly lower than 50%. I have implemented my network based on TensorFlow too. In TensorFlow, it can run ~120 examples/sec, while in PyTorch it can only run ~60 examples/sec.
My Dataset implementation is similar to ImageFolder. Multiple workers didn’t help. In fact, the more workers I use, the slower the loading speed. This happens in my TensorFlow implementation too, so only 1 worker is used in my TensorFlow implementation. However, In PyTorch, 0 worker is optimal for me. 1 worker will dramatically decrease the speed to ~30 examples/sec.
I guess one of the reasons is that you are creating a new db transaction each time __getitem__ gets called. That will be a lot of overhead.
Also, sequentially loading the items using an iterator (as done in caffe) will be faster than using f.get(key), but current Dataset API doesn’t seem to support iterator…
Yes, that’s true! It’s much faster when using the cursor of lmdb, and turn off the shuffle choice. You can shuffle the dataset by yourself and read them in order.
Maybe we don’t need the Dataset to support iterator. The API getitem has a index argument. if we turn off the shuffle switch of the dataloader, the index argument can indicate the end of the dataset. in this situation, we can use the lmdb cursor to get our data in order, one by one, it will be much faster, and the GPU utilization will rise.
Yes, that should definitely work. A potential issue with that is __getitem__ is no longer doing random access as it should be. But
that shouldn’t be too much of a problem as long as one keep aware of that.