I try to train 3d conv net for video recognition. My expected data volume per batch is [32, 3, 32, 224, 224]. When I use DataLoader, it seems so slow for such a huge volume. I wait ~300 secs for starting to serve batches with 8 workers. And general scheme of timing is ~300 sec wait for one iteration and then no wait for next 4 learning iterations and then wait 300 secs and so on. I think it requires 300 secs to ready all batches in all workers but clearly it is away from practical. I try to increase workers then wait time increased for jammed iterations.
Note that, I’ve enough memory and It does not overload the memory. One concern is, I use HDD and lookup time is large undoubtedly. Do you have any suggestion to improve data loading scheme ?
what kind of pre-processing are you doing on this video volumes? and are you using an efficient video loader? (maybe opencv?).
Identifying the bottlenecks in your DataLoader using line_profiler might be helpful. https://github.com/rkern/line_profiler
When running with line_profiler, run the program with workers=0, so that the DataLoader is in the main process and not a separate process.
Thanks for your return. My take, the bottleneck is the disk search time.
Here is what I do more precisely. I burst videos to frames first. Then in data_loader get() function, I pick 32 frames for each video, load each frame by opencv and aggregate them into a single tensor (which is 3x32x224x224 adn 32 is seq length). I’ve not used line_profiler yet but as I observed iotop read speed is limited by 5MB/sec with 8 workers (optimal number of workers) on a drive whose theoretical best is 350MB/sec. I think, it is all about disk search latency (it is HDD unfortunately).
Now we are working on a binary file format which can be used by Pytorch and we are able to increase the load speed to 34 MB/sec. If you are interested, we can discuss more about this. I think it would be a good extension for Pytorch too.
Please let me know, if you have any comment.
if it is a HDD, yes a binary format and serial reads do make sense.
Sorry, I’m late to the party. Have you tried serving your data from a TensorDataset with mmapped tensors? Then initial load time goes to zero because the tensor is never actually loaded into memory as a whole. Something like:
torch.FloatTensor(torch.FloatStorage.from_file("frames.bin", True, -1)).view(-1, 3, 224, 224)
To help with disk seeking, write a scheduler that would keep skipping forward your dataset (and then wrap around) with relatively small but random steps. Take the steps from a one-sided Cauchy or Student’s t distribution to get a bunch of small steps with a few large skips.